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SECTION  1 


INTRODUCTION  AND  OVERVIEW 

An  important  quality  attribute  of  a  computer  system 
is  the  degree  to  which  it  can  be  depended  upon  to  perform 
its  intended  function  in  the  specified  environment. 

Evaluation  and  prediction  of  this  attribute  has 
concerned  computer  designers  and  users  from  the  early 
days  of  their  evolution.  Until  the  late  sixties,  the 
attention  was  almost  solely  on  the  performance  of  hard¬ 
ware  aspects  of  the  system,  in  the  early  seventies,  soft¬ 
ware  became  the  center  of  attention  due  to  a  continuing 
increase  in  the  ratio  of  software  to  hardware  costs,  in 
both  the  production  and  the  operational  phases. 

The  performance  of  a  software  system  is  dependent 
on  the  tools  and  methodologies  used  during  its  develop¬ 
ment,  and  an  important  measure  of  performance  is  the 
nature  and  frequency  of  software  errors. 

This  report  is  primarily  concerned  with  the  develop¬ 
ment  of  stochastic  models  for  describing  the  software  error 
occurrence  phenomenon  and  determining  software  reliability. 
A  description  of  software  errors  and  their  sources  is 
given  in  Section  1.1  and  an  error  classification  scheme 
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is  described  in  Section  1.2.  The  notion  of  software 


reliability  is  discussed  in  Section  1.3.  Current 
suggested  approaches  for  enhancing  software  reliability 
are  given  in  Section  1.4.  A  description  of  software 
reliability  models  reported  in  the  literature  is  given 
in  Section  1.5. 

A  non-homogeneous  Poisson  process  (NHPP)  model 
based  on  an  exponentially  decaying  error  occurrence 
rate  is  developed  in  Section  2.  Many  useful  software 
performance  measures  are  developed  and  several  software 
failure  data  sets  are  analyzed  to  show  the  applicability 
and  usefulness  of  this  model. 

In  Section  3,  another  NHPP  model  is  proposed  which 
can  be  used  to  model  both  the  increasing  and  the  decreas¬ 
ing  failure  rates  during  the  software  integration  testing 
phase. 

The  problem  of  when  to  stop  testing  and  start  using 
software  is  discussed  in  Section  4.  Various  useful 
scenarios  are  considered  and  optimum  release  time  poli¬ 
cies  are  developed.  The  results  are  illustrated  via 
numerical  examples. 

A  related  problem  of  modelling  the  total  hardware- 
software  system  is  addressed  in  Appendix  A.  This  task 
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liability  modelling 


1.1  SOFTWARE  ERRORS  AND  THEIR  SOURCES 

Software  (also  called  program)  is  essentially  an 
instrument  for  transforming  a  discrete  set  of  inputs 
into  a  discrete  set  of  outputs  (see  Figure  1.1).  It 
comprises  of  a  set  of  coded  statements  whose  function 
may  basically  be  one  of  the  following: 

1.  Evaluate  an  expression  and  store  the  result 
in  a  temporary  or  permanent  location. 

2.  Decide  which  statement  to  execute  next. 

3.  Perform  input/output  operations. 

Since,  to  a  large  extent  software  is  produced  by  humans, 
the  finished  software  product  is  often  imperfect.  It  is 
imperfect  in  the  sense  that  a  discrepancy  exists  between 
what  the  software  can  do  versus  what  the  user  or  the  com¬ 
puting  environment  wants  it  to  do.  The  computing  en¬ 
vironment  refers  to  the  physical  machine,  operating  system 
compiler  and  translators,  utilities,  etc.  These  dis¬ 
crepancies  are  what  we  call  software  errors  (see  Figure 
1.2).  Basically,  the  software  errors  can  be  attributed 
to  the  following: 

1.  Ignorance  of  the  user  requirements; 

2.  Ignorance  of  the  rules  of  the  computing 
environment;  and 


3.  Poor  communication  of  software  requirements 
between  the  user  and  the  programmer  or  poor 
documentation  of  the  software  by  the  programmer. 
The  fact  of  the  matter  is  even  if  we  know  that  soft¬ 
ware  contains  errors,  we  may  not  know  with  certainty  the 
exact  identity  of  these  errors. 

Currently,  there  are  two  major  paths  one  can  follow 
to  expose  software  errors: 

1.  Program  proving,  and 

2.  Program  testing. 

Program  proving  is  more  formal  and  mathematical  while 
program  testing  is  more  practical  and  still  remains  to 
be  heuristic  in  its  approach.  The  approach  in  program 
proving  is  the  construction  of  a  finite  sequence  of  logi¬ 
cal  statements  ending  in  the  statement  (usually  the  output 
specification  statement}  to  be  proved.  Each  of  the  logi¬ 
cal  statements  is  an  axiom  or  is  a  statement  derived  from 
earlier  statements  by  the  application  of  an  inference 
rule.  Program  proving  making  use  of  inference  rules  is 
known  as  the  Inductive  Assertion  Method.  This  method  was 
mainly  popularized  by  Floyd,  Hoare,  Dijkstra  and  recently 
Reynolds.  Other  work  on  program  proving  is  the  work  on 
the  Symbolic  Execution  Method.  This  method  is  the  basis 
of  some  automatic  program  verifiers.  Despite  the  formalism 
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and  mathematical  exactness  of  program  proving,  it  is 
still  an  imperfect  tool  for  verifying  program  correct¬ 
ness.  Gerhart  and  Yelowitz  [GER  76]  showed  several 
programs  which  were  proven  to  be  correct  but  still  con¬ 
tained  errors.  The  errors  were  due  to  failures 
in  defining  what  exactly  to  prove  and  were  not  failures 
of  the  mechanics  of  the  proof  itself. 

Program  testing  is  the  symbolic  or  physical  execu¬ 
tion  of  a  set  of  test  cases  with  the  intent  of  exposing 
embedded  errors  (if  any)  in  the  program.  Like  program 
proving,  program  testing  remains  an  imperfect  tool  for 
verifying  program  correctness.  A  given  testing  strategy 
is  good  for  exposing  certain  kinds  of  errors  but  not  all 
possible  kinds  of  errors  in  a  program.  An  advantage  of 
testing  is  that  it  provides  accurate  information  about  a 
program's  actual  behavior  in  its  actual  computing  environ¬ 
ment;  proving  is  limited  to  conclusions  about  the  program's 
behavior  in  a  postulated  environment. 

Neither  proving  nor  testing  can,  in  practice,  guaran¬ 
tee  complete  confidence  on  the  correctness  of  programs . 

Each  has  its  pluses  and  minuses.  They  should  not  be 
viewed  as  competing  tools.  They  are, in  fact,  complementary 
methods  for  decreasing  the  likelihood  of  program  failure 
[GOO  77] . 


1.2  SOFTWARE  ERROR  CLASSIFICATION 


A  systematic  study  of  software  errors  in  a,  program 
requires  knowing  what  specifically  these  errors  are  and 
knowing  which  tool(.s)  to  use  to  expose  particular  types 
of  software  errors .  Software  errors  can  be  grouped  as 
syntax,  semantic,  runtime,  specification  and  performance 
errors . 


1.2.1  Syntax  Errors 

These  errors  are  due  to  discrepancies  between  the 
program  code  and  the  syntax  rules  governing  the  parser 
or  lexical  analyzer  of  a  program  translator,  These  are 
the  easiest  errors  to  detect.  They  can  be  detected  by 
visual  inspection  of  the  code  or  can  be  detected  mechani*- 
cally  during  the  program  compilation  process.  Experienced 
programmers  rarely  commit  syntax  errors. 

1.2.2  Semantic  Errors 

These  errors  are  due  to  discrepancies  between  the 
program  code  and  what  the  semantic  analyzer  of  the  computing 
environment  accepts.  Among  the  popular  kinds  of  semantic 
errors  are  typechecking  errors  and  implementation  restric¬ 
tion  errors.  Again,  they  may  be  detected  by  the  semantic 
analyzer  of  a  program  translator  or  by  visual  inspection. 


Syntax  and  semantic  errors  are  detected  during  the 
compilation  stage  of  a  program.  A  program  having  syntax 
and/or  semantic  errors  cannot  be  executed.  Syntax  and 
semantic  errors  are  mainly  due  to  the  ignorance/negligence 
on  the  part  of  the  programmer  about  the  restrictions  and 
limitations  of  the  language  (s)he  is  using. 

1.2.3  Runtime  Errors 

As  the  name  implies,  runtime  errors  occur  during 
the  actural  running  of  a  program.  They  may  be  further 
classified  into  three  categories: 

Domain  errors 

A  domain  error  occurs  whenever  the  value  of  a 
program  variable  exceeds  its  declared  range  or  exceeds 
the  physical  limits  of  the  hardware  representing  the 
variable.  The  declared  range  of  a  variable  is  done  im¬ 
plicitly  or  explicitly.  FORTRAN,  for  example,  assigns 
types  to  variables  based  on  the  variable  name  or  based 
on  a  declaration  statement.  PASCAL  requires  all  variables 
to  be  explicitly  declared  in  a  declaration  statement. 
PASCAL  has  facilities  to  declare  ranges  by  enumeration 
and/or  subsets  of  numeric  domains. 

Some  program  translators  produce  runtime  code  for 
checking  certain  types  of  domain  errors.  Some  have  built- 
in  recovery  features  for  domain  errors  (e.g.  PL/1,  COBOL) 
and  others  (e.g.  FORTRAN)  simply  abort  execution  upon 
the  occurrence  of  a  domain  error.  Certain  compilers,  like 


PASCAL,  automatically  check  for  values  outside  a  declared 
range . 

Domain  errors  are  a  serious  matter  because 

a)  program  execution  is  aborted,  and/or 

b)  program  results  are  incorrect. 

Execution  abortion  may  be  fatal  especially  in  real-time 
systems.  Despite  their  seriousness,  domain  errors  have 
never  been  formally  and  extensively  studied  in  the  litera¬ 
ture.  This  is  because  detection  of  domain  errors  can  be 
very  difficult.  They  require  exact  specification  of  the 
ranges  of  the  input  variables.  Also,  the  test  values 
required  to  expose  these  errors  may  occur  at  the  input 
domain’s  boundary  or  inside  the  input  domain  itself. 

Computational  errors 

Computational  errors,  sometimes  known  as  logic  errors, 
result  whenever  the  program  results  in  an  incorrect  output. 
The  incorrect  output  may  be  due  to  a  wrong  formula,  an 
incorrect  control  flow,  assignment  to  a  wrong  variable, 
incorrect  parameter  passing,  etc. 

It  is  not  possible  to  generate  runtime  code  to  de¬ 
tect  computational  errors  during  program  execution.  This 
is  because  computational  errors  are  really  discrepancies 
between  the  program's  output  and  the  program's  specifica¬ 
tions  . 


1-11 


Computational  errors  due  to  incorrect  program  con¬ 


structs  and  statements  may  be  detected  by  any  of  the 
structure  dependent  or  structure  independent  testing 
techniques.  However,  none  of  these  tools  can  guarantee 
total  absence  of  these  types  of  computational  errors  in 
a  program.  Computational  errors  due  to  missing  program 
constructs  and  statements  may  be  detected  by  any  of  the 
structure  independent  testing  techniques.  Again,  none 
of  these  tools  can  guarantee  total  absence  of  computational 
errors  due  to  missing  paths. 

Non-Termination  errors 

Non-termination  error  is  simply  the  failure  of  a 
program  to  terminate  in  finite  time  without  outside  inter¬ 
vention.  The  most  common  cause  of  non- termination  errors 
is  when  the  program  runs  into  an  infinite  loop.  Non¬ 
termination  can  also  occur  if  a  set  of  concurrent  programs 
falls  into  a  dead  lock. 

Infinite  loops  are  detected  by  simply  executing 
each  of  the  loops  in  a  program.  However,  this  strategy 
may  not  guarantee  total  absence  of  infinite  loops.  Some 
infinite  loops  may  only  occur  if  certain  program  variables 
achieve  certain  values.  Program  proving  may  also  be  used 
on  certain  programs  to  expose  infinite  loops.  The  problem 
of  program  non-termination  in  general  is  still  an  unsolved 
problem. 


1.2.4  Specification  Errors 


Specification  errors  result  whenever  there  exists  a 
discrepancy  between  the  statement  of  specifications  and 
the  statement  of  user  requirements.  A  requirements  error 
exists  whenever  there  is  a  discrepancy  between  the  state¬ 
ment  of  user  requirements  and  the  real  user  requirements. 

Presently,  detection  of  specification  errors  such  as; 

1.  Incomplete  specifications, 

2.  Inconsistent  specifications,  and 

3.  Ambiguous  specifications, 

remains  an  informal  process.  This  is  mainly  due  to  the 
nonexistence  of  a  specification  language  powerful  enough 
to  translate  the  user  requirements  into  clear,  complete 
and  consistent  terms. 

A  testing  tool  to  detect  specification  errors  is 
yet  to  be  developed. 

1.2.5  Performance  Errors 

Performance  errors  exist  whenever  a  discrepancy 
exists  between  the  actual  performance  (efficiency)  of  the 
programs  and  its  desired  or  specified  performance.  Program 
performance  may  be  measured  in  a  number  of  ways; 

1.  Response  time 

2.  Elapsed  time 

3.  Memory  space  usage 

4.  Working  set  requirement,  etc. 
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The  actual  measurement  of  the  above  measures  of 
program  performance  can  be  a  very  difficult  process. 

Program  complexity  theory  tries  to  estimate  bounds  on 
the  running  time  of  certain  program  algorithms.  Statis¬ 
tical  analysis  and  simulation  can  also  be  employed  to 
estimate  the  above  performance  variables.  However,  use 
of  these  tools  can  be  very  expensive  and  time  consuming. 

A  performance  testing  tool  that  is  economical  (time- 
wise  and  costwise)  to  use  is  yet  to  be  developed. 

The  most  expensive  kind  of  software  errors  to  elimi¬ 
nate  are  those  which  are  not  discovered  until  late  in  the 
software  development,  such  as  when  the  software  becomes 
operational.  These  are  known  as  persistent  software  errors 
Glass  [GLA81]  reported  that  persistent  software  errors  are 
mostly  due  to  the  failure  of  the  problem  solution  (i.e.  the 
program!  to  match  the  complexity  of  the  problem  to  be 
solved  (i.e.  the  user  requirements).  Examples  of  such 
errors  are  computational  errors  due  to  missing  or  insuffi¬ 
cient  predicates  and  failure  to  reset  a  variable  to  some 
baseline  value  after  its  use  in  a  functional  logic  segment. 
The  solution  to  this  software  problem  is  beyond  the  current 
state-of-the-art . 


1,3  SOFTWARE  RELIABILITY 

There  are  a  number  of  conflicting  views  as  to  what 
software  reliability  really  is  and  how  it  should  be  quanti¬ 
fied.  The  conflict  arises  because  of  the  disagreement  in 
the  basic  definition  of  the  term  "software  reliability" • 
Software  reliability  in  the  view  of  some  people,  especially 
the  computer  science  purists,  should  be  closely  tied  to  the 
correctness  of  software.  They  argue  that  an  incorrect 

software  (i.e.,  a  software  still  containing  errors)  is 
doomed  to  fail  sooner  or  later  and  thus  its  reliability 
should  be  zero  (0) .  Once  the  software  has  been  freed  of 
all  errors,  then  its  reliability  becomes  one  (1).  On  the 
other  hand,  software  reliability,  as  viewed  by  many  engi¬ 
neers,  statisticians,  and  practitioners,  should  be  closely  tied 
to  the  concept  of  "probabilistic  reliability"*  These 
groups  of  people  argue  that  many  programs  used  in  the 
real  world  are  known  to  still  contain  errors  and  yet 
they  are  executed  day  after  day  without  occurrences  of 
failures.  Software  reliability,  they  believe,  should 
be  viewed  as  the  probability  that  a  software  system  will 
operate  without  a  failure  for  a  specified  (mission)  time. 

One  way  to  resolve  this  conflict  is  to  look  back 
at  the  original  problem  in  the  real  world  and  ask  our¬ 
selves  the  question:  "Why  do  we  need  to  know  software 
reliability?" 
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The  original  real  world  problem,  in  very  simple  terms, 
is  as  follows: 

Develop  software  that  will  satisfy  the  user's  require¬ 
ments  in  the  most  efficient  (in  both  time  and  money  sense) 
way  possible. 

The  solution  to  this  problem  turns  out  to  be  very 
difficult  basically  because  of  the  following  facts; 

1.  Real  world  software  is  large  and  complex. 

2.  Users  are  not  always  100  percent  certain 
about  their  requirements. 

3.  Resources  (time  and  money)  allocated  for  soft¬ 
ware  development  are  always  limited. 

Even  if  we  know  that  we  only  need#  say,  2000  test  cases  to  run 
for  exposure  all  possible  embedded  errors  in  a  software,  chances 
are  that,  in  the  real  world,  we  may  not  have  enough  time 
and  money  to  perform  this  exhaustive  test.  As  more  and 
more  errors  are  uncovered  by  our  testing  or  correctness 
verification  process,  the  additional  cost  of  exposing  the 
other  remaining  errors  rises  very  fast.  Thus,  beyond 
a  point  it  is  almost  practically  useless  to  continue 
testing  to  achieve  100  percent  correctness.  This  explains 
the  reason  why  most  all  software  systems  in  public  and 
private  use  still  have  embedded  errors. 
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If  we  adopt  the  point  of  view  of  a  computer  science 
purist,  then  almost  all  software  systems  in  use  (including 
those  that  are  accepted  as  very  reliable  and  useful  by  their 
users)  have  zero  reliability .  Since  everything  now  has 
zero  reliability,  the  value  or  usefulness  of  the  software 
reliability,  concept  is  lost. 

The  reason  why  people  introduced  the  concept  of  soft¬ 
ware  reliability  (or  hardware  reliability  for  that  matter) 
is  to  have  a  useful  measure  that  may  help  us  in  dealing 
with  the  original  real  world  software  (hardware)  problem. 

This  measure  is  useful  in  planning  and  controlling  addition¬ 
al  resources  (time  and  money)  for  enhancing  the  reliability 
of  a  software.  It  is  also  a  useful  measure  for  giving  the 
user  confidence  about  the  software  quality. 

Should  we,  then,  adopt  the  hardware  based  concept 
of  software  reliability?  One  answer  to  this  question  at 
this  point  in  time  is  yes,  but  with  extreme  care.  We 
should  be  careful  because  there  are  inherent  differences 
between  software  and  hardware.  Hardware  exhibits  mixtures 
of  decreasing  and  increasing  failure  rates.  The  decreasing 
failure  rate  is  due  to  the  fact  that  as  use  time  on  the 
hardware  system  accumulates,  more  and  more  errors  (most 
probably  design  errors)  are  encountered  and  fixed.  The 
increasing  failure  rate  is  due  primarily  to  hardware  component 


wearout.  There  is  no  such  thing  as  wearout  in  software. 

It  is  true  that  software  may  become  obsolete  because  of 
changes  in  the  user  and  computing  environment  but  once  we 
modify  software  to  reflect  these  changes,  then  we  are  no 
longer  talking  of  the  same  software  but  an  enhanced  or 
modified  version.  Like  hardware,  software  exhibits  a  de¬ 
creasing  failure  rate  as  the  usage  time  on  the  system 
accumulates  and  errors  (due  to  design  and  coding)  are  fixed. 
Thus,  a  hardware-type  approach  to  software  reliability  should 
be  done  only  in  appropriate  environments. 

Suppose  we  declare  that  the  reliability  of  a  given 
software  is  0.95.  What  does  this  number  exactly  mean? 
Following  the  probabilistic  point  of  view,  this  may  mean 
any  one  of  the  following: 

1.  If  we  execute  the  software  several  times,  95 
percent  of  the  time  it  will  give  correct 
results. 

2.  We  are  95  percent  confident  that  the  software 
will  give  correct  results  when  executed. 

The  first  interpretation  is  the  so-called  frequency  inter¬ 
pretation  and  the  second  is  the  so-called  subjective  in¬ 
terpretation.  Littlewood's  contention  [LIT80]  is  that  in 
the  absence  of  a  "scientific"  verifiable  meaning  for  the 
number  0.95,  the  only  reasonable  interpretation  is  the  sub¬ 
jective  interpretation. 
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The  only  problem  we  see  with  this  number  is  its  possible 


inconsistency.  A  software  may  have  been  declared  95  percent 
reliable  by  the  developer  but  may  have  a  different  per¬ 
ceived  reliability  by  the  user  and  probably  a  different 
percieved  reliability  by  another  user.  A  very  simple 
example  will  illustrate  this  point.  Suppose  a  software 
is  composed  of  100  modules.  Because  of  practical  considera¬ 
tions,  the  software  developer  stops  testing  after  90  modules. 
He  then  declares  the  reliability  of  the  system  as  90  percent. 
A  user  buys  the  system  and  happens  to  use  in  his  particular 
application  some  modules  (or  maybe  program  paths)  which 
have  not  been  tested.  As  a  result,  50  percent  of  the  time, 
the  user  gets  incorrect  results.  His  perceived  reliability 
of  the  system  is  therefore  50  percent.  Another  user  might 
use  a  different  mixture  of  untested  modules  (program  paths) 
and  might  get  a  different  number  for  the  reliability  measure. 
The  basic  question  is:  What  is  the  true  reliability  of 
the  software? 

The  only  way  to  resolve  this  question,  we  feel,  is 
to  further  qualify  or  condition  our  software  reliability 
measure.  Ultimately,  what  is  more  important  is  that  the 
user  gets  his  correct  results  from  the  software.  Thus, 
that  user  should  be  more  concerned  about  a  reliability 
measure  conditioned  with  respect  to  his  requirements.  The 
software  developer  should  be  concerned  with  a  reliability 
measure  conditioned  with  respect  to  the  intended  specifica- 
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tions  of  the  system.  We  should  remember  that  the  purpose 
<?f  the  reliability  measure  is  to  help  in  planning  and  con¬ 
trolling  the  production  of  software  and  nothing  more.  We 
may  pool  all  the  users  into  one  big  user  (for  example,  user 
of  an  operating  system  software)  and  come  up  with  an  average 
reliability  measure.  Still,  this  number  may  not  match  the 
developer's  measured  reliability.  If  we  let  R[s|r]  mean 
the  reliability  of  the  software  system  S  with  respect  to 
requirements  r,  then  in  general,  we  have: 

R[S|user  requirements]  R[S  [  developer  requirements] 


1.4  APPROACHES  FOR  ENHANCING  SOFTWARE  RELIABILITY 


Consider  the  concept  of  software  reliability  based 

on  the  following  definition  [MYE76] : 

Software  reliability  is  the  probability  that  the 
software  will  execute  for  a  particular  period  of 
time  without  a  failure,  weighted  by  the  cost  to 
the  user  of  each  failure  encountered. 

This  definition  is  not  necessarily  based  on  the 
actual  number  of  errors  residing  within  the  software  system 
but  is  based  on  the  impact  that  the  errors  have  on  the  users 
For  example,  a  single  error  in  a  space  shuttle  control  soft¬ 
ware  is  much  more  important  than  several  errors  in  a  matrix 
inversion  software  system  which  cause  only  "trivial" 
failures.  Certainly,  a  software  system  which  does  not 
contain  a  serious  error  but  has  many  trivial  errors  would 
generally  be  considered  much  more  reliable  than  a  system 
which  does  not  contain  the  trivial  errors  but  contains 
the  single  serious  error. 

The  reliability  of  a  software  system  is  generally 
expected  to  grow  as  it  evolves  from  the  design  stage  to 
the  coding  stage  and  testing  stages  and  down  to  the  opera¬ 
tional  and  maintenance  stage.  Modern  software  engineering 
practice  advocates  that  testing  should  be  performed  as 
early  as  the  design  stage.  Software  errors  detected  in 
the  design  stage  are  easier  and  less  expensive  to  remove 
than  those  detected  during  the  testing  or  operational 
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stage.  We  also  know  that  modern  software  design  methodolo¬ 
gies  help  in  the  likelihood  of  not  committing  errors  in  the 
design  stage.  Redundant  programming,  that  is,  implementing 
a  software  in  different  ways,  is  sometimes  used  to  enhance 
software  reliability.  Fault  tolerance  programming  is 
another  popular  technique.  However,  testing  still  remains 
the  most  commonly  used  approach  to  enhancing  software  re¬ 
liability. 

Testing  for  the  presence  of  errors  is  usually  done 
in  stages  [CHA78] : 

1.  First  stage  is  the  testing  done  at  the  module 
level  by  the  implementing  programmer. 

2.  Modules  are  then  integrated  forming  a  subsystem 
or  the  whole  system  is  tested.  The  system  is  then 
tested.  This  is  also  known  as  alpha  testing. 

3.  The  software  is  then  given  to  several  "friendly 
users"  who  are  willing  to  use  the  software  in  an 
operational  environment  and  the  problems  encountered 
with  the  software  are  reported.  This  is  known  as 
beta  testing. 

4.  Finally,  software  is  released  to  all  users  and 
corrections  are  issued  against  it  as  problems  are 
reported  by  the  users. 

This  overall  testing  process  coupled  with  the  design 
testing  process  would,  hopefully,  result  in  an  enhanced 
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reliability  of  the  software  system.  Can  the  reliability 
of  the  software  decrease  as  a  result  of  the  software 
correction  (debugging)  process?  The  answer  is  yes.  This 
occurs  when  additional  errors  are  accidentally  injected 
into  the  system  while  removing  some  other  errors. 

Hopefully,  with  the  use  of  better  design  methodologies, 
better  documentation  techniques,  better  programming  languages 
better  testing  strategies  and  better  software  management 
techniques,  the  likelihood  of  software  reliability  decreasing 
as  the  system  evolves  from  the  design  to  the  operational 
stage  will  become  less. 


1.5  SOFTWARE  RELIABILITY  MODELS 

Many  studies  have  been  undertaken  during  the  last 
decade  to  analyze  and  study  software  failure  data  with 
the  objective  of  finding  ways  that  will  lead  to  improved 
software  performance.  Such  studies  can  be  classified 
into  one  (or  both),  of  two  categories.  In  the  first  cate¬ 
gory,  the  emphasis  is  on  the  analysis  of  software  failure 
data  collected  from  small  or  large  projects  during  develop¬ 
ment  and/or  operational  phases.  Studies  in  the  second 
category  are  primarily  aimed  at  the  development  of  analyti¬ 
cal  models  which  are  then  used  to  obtain  the  reliability 
and  other  quantitative  measures  of  software  performance. 

The  analytical  modelling  work  can  then  be  classi¬ 
fied  into  the  following  three  major  categories.  The  first 
one  emphasizes  the  stochastic  nature  of  software  failures, 
while  the  second  and  the  third  use  combinatorial  analysis  to 
provide  measures  of  software  reliability, 

1.  Failure  Rate  Based  Models. 

2.  Combinatorial  or  Error- Seeding  Models. 

3.  Input  Domain  Based  Models. 


1.  Failure  (Hazard)  Rate  Based  Models:  The  times 


between  indigenous  errors  or  the  number  of  indige¬ 
nous  errors  observed  during  testing  are  used  to 
estimate  the  shape  of  the  hypothesized  hazard 
function,  From  the  estimated  hazard  function, 
one  can  estimate  the  number  of  errors  remaining 
in  the  software,  the  mean- time-to-failure  (MTTF) 
or  the  reliability  of  the  software, 

2.  Combinatorial  or  Error  Seeding  Models;  A 
known  number  of  errors  are  seeded  (planted),  in 
the  program.  After  testing,  the  number  of  ex¬ 
posed  seeded  and  indigenous  errors  are  counted. 
Using  combinatorics  and  maximum  likelihood  esti¬ 
mation,  estimates  of  the  number  of 

indigenous  errors  in  the  program  or  the  relia¬ 
bility  of  the  software,  can  be  estimated. 

3,  Input  Domain  Based  Models  :  The  basic  approach 
here  is  to  generate  a  set  of  test  cases  from  an 
input  (operational)  distribution.  Because  of  the 
difficulty  in  estimating  the  input  distribution, 
the  various  models  in  this  group  partition  the 
input  domain  into  a  set  of  equivalence  classes.  An 
equivalence  class  is  usually  associated  with  a  pro¬ 
gram  path.  The  reliability  measure  is  calculated 
from  the  observed  failures  after  execution  (sym¬ 
bolic  or  physical)  of  the  sampled  test  cases. 
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Failure  Rate  Based  Models 


Failure  rate  based  models  can  be  further  classified 
as  shown  in  Table  1.1. 


The  failure-rate  (also  known  as  hazard  rate)  function 
z(t)  is  defined  as  the  conditional  probability  that  an  error 
is  exposed  in  the  interval  t  to  t+At,  given  that  the  error 
did  not  occur  prior  to  time  t  [MYE76].  The  reliability 
function  R(t)  is  the  probability  that  no  errors  will  occur 
from  time  zero  to  time  t.  Reliability  theory  tells  us  that 
z(tl  and  R(.t)  are  related  in  the  following  form; 
z(t)  *  [-dR(t)/dtJ/R(t) 


or 


t 

R(t)  =  exp(-  |  z(x)dx) 
0 


Also,  mean- time-to-failute  (MTTF)  =  l/z(t). 

Estimation  of  reliability,  once  the  failure  rate 
function  z(t)  is  known  is  thus  straightforward.  The 
failure  rate  based  models  given  in  Table  1.1  basically 
differ  in  their  assumption  on  the  failure  rate  function 
z(t).  Table  1.2  below  displays  these  differences; 
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TABLE  1.1  TABLE  OF  FAILURE- RATED  BASED  SOFTWARE 

RELIABILITY  MODELS 


Error  Count 
Based  Failure 
Rate  Models 


Classical 

De-Eutrophication  Process 
Model  of  Jelinski -Moranda 
[ JEL72 ] 

Linear  Function  Testing 
Time  Model  of  Schick  and 
Wolverton  [SCH78] 

Parabolic  Function  Testing 
Time  Model  of  Schick  and 
Wolverton  [SCH78] 

Shooman  Model  ISH072] 

Shooman-Nataraj an  Model 
[DUN82J 

Execution  Time  Model  of 
Musa  IMUS75] 


Bayesian 

Littlewood  Model 
[LIT80] 


Goel-Okumoto 
Imperfect  Debugging 
Model  IGOE79  ] 


Non-Error 
Count  Based 
Failure  Rate 
Models 


Non -Homogeneous  Poisson 
Process  Model  of  Goel  § 
Okumoto  [GOE79] 

Geometric  De-Eutrophica¬ 
tion  Process  Model  of 
Moranda  [MOR75] 

Geometric  Poisson  Process 
Model  of  Moranda  [MOR75] 

Wagoner  Model  [DUN82] 


Littlewood  and 
Verrall  Model 
[LIT73] 

Thompson  5  Che  Is  on 
Model  [DUN 8 2 ] 


TABLE  1.2  SUMMARY  OF  FAILURE-RATE  BASED  MODELS 


Model 

De- Eutrophication 
Process  Model 


Schick-Wolverton 
Linear  Failure  Rate 
Model 


Schick-Wolverton 
Parabolic  Failure- 
Rate  Model 


Shooman  Model 


Shooman-Nataraj  an 
Model 


Assumption  on  z(t) 

The  software  failure  occurrence  rate  at 
any  time  t  is  assumed  proportional  to 
the  number  of  errors  remaining  in  the 
sof tware ,  i . e . ,  for  the  time  interval 
between  (i*l)st  and  ith  failure,  we 
have  Z (X±l  =  <p [N  -  (i-l)J,  N  is  the 
initial  error  content. 

Failure  rate  is  assumed  proportional 
to  number  of  remaining  errors  in  soft¬ 
ware  and  test  time.  For  ith  interval. 
ZCX*)  «  +[N  -  (i-l)  ]Xi. 

Failure  rate  is  assumed  proportional  to 
residual  errors  and  a  parabolic  function 
of  test  time.  For  ith  interval,  Z(X.)  = 
$[N  -  (i ■!)](.“ +  bx^  +  c)  a,b,c  >  0l 


Z(t)  *  K[E^./Ij  -  |  p(x)dx]  where: 

K  :  proportionality  constant 
Et:  total  #  errors 

Ij!  total  #  instructions  (object  code) 
t  :  debugging  time 

p(x):  number  of  errors/total  #  instruc¬ 
tion  s/x  debugging  time. 

J  p(x)dx:  total  #  of  errors  per  1^ 

®  removed  during  time  units 

of  debugging  time. 


Z(t)  =  Ker(t)  where: 

e  :  number  of  remaining  software 
r  faults. 

K  :  constant  of  proportionality 
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Execution  Time 
Model  of  Musa 


Non-Homogeneous 
Poisson  Process 
(NHPP)  Model  of 
Goel  §  Okumoto 


Geometric  De- 
Eutrophication 
Process  Model  of 
Moranda 


Geometric  Poisson 
Process  Model  of 
Moranda 


Wagoner  (Weibull) 
Model 


I.  z(t)  *  KfNQ  -  Kfn(x)  where: 

K  ;  error  exposure  ratio 

linear  execution  frequency  of 
program 

Nq:  initial  error  content 

t  :  CPU  time  utilized  in  operating 
the  program 

n(r):  net  number  of  errors 
corrected  during 


II  If  dn(x)/dt =  error  exposure  rate, 
then  ZCt}  =  KfN0exp[-Kfr] 


Assumes  that  the  error  detection  rate 
X(t)  is  time  dependent  and  is  given  by: 
X(t)  *  abexp[-bt]  where: 

a:  expected  number  of  errors  to 
be  eventually  detected 
b,  error  detection  rate  per  error 


Assume  that  the  steps  representing  the 
decrease  in  failure  rate  between 
adjacent  failure  time  are  geometri¬ 
cally  varying.  .  . 

Z(Xj)  »  DK1" 1  where: 

D:  initial  error  detection  rate 

DK;  error  detection  rate  after  the 
,  occurrence  of  the  1st  error 


DK*'*;  error  detection  rate  after  the 
occurrence  of  the  ith  error. 

A  superposition  of  a  geometric  De- 
Eutrophication  process  and  a  Poisson 
process  with  parameter  0, 

ZCX^  =  DK1  ’ 1  +  0 

zCt)  =  £  (|).X_1  where: 

o  :  scale  parameter 
X  ;  parameter  to  squeeze  or  stretch 
the  shape  of  the  distribution 
t  r  CPU  time  units 


•i 
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Goel-Okumoto 
Imperfect  Debugging 
Model 


Thompson  §  Che  Is on 
Model 


Littlewood  and 
Verrall  Model 


Littlewood  Model 


The  form  of  Z(t)  is  not  obvious: 
however,  the  reliability  function 
between  the  (k-l)st  and  Kth  failure  is; 


K- 1 


rk(x)  =  l  (K:1)PK‘j'1Q;lexp[-(N^K+j  +  l)]x 
j  =  0  J 


where : 

P;  Prob  {of  successful  correction 
of  a  defect} 

Q:  Prob  {of  imperfect  debugging}3  1-P 

X:  Parameter  of  the  exponential 

distributions  governing  the  time 
between  failures 

N:  Estimated  initial  number  of  defects. 


z(t)  =  X  but  X  is  treated  as  a  random 
variably  with  a  gamma  density  function 

VAV  ° 

0  U  exp l -  T0] 


rncTTr 


where : 

KqJ  observed  failures 

Tq;  testing  time  for  KQ 

Kq  and  Tq  essentially  represent 
'  previous  testing  experience. 


zC.tl=  X  but  X  is  treated  as  a  random 
variable  distributed  as  Gamma  with 
shape  parameter  a  and  scale  parameter 
¥(i),  an  increasing  function  of  i. 

Z(X.}=X.  and  X-  is  distributed  as 
11  1  i-1 

gamma  ClN-i  +  l}a,  6+  £  t^),  where: 

N-ri  +  l;  number  of  errors  remaining 
when  (i-1)  failures  have 
occurred, 

t.:  execution  time  from  (j-l)st 

J  failure  to  the  jth  failure, 

a, 6:  parameters  of  gamma  distribution. 


As  noted  above  a  number  of  assumptions  are  made  in 
the  development  of  failure-rate  models.  A  discussion  of 
these  assumptions  is  provided  in  the  following  paragraphs 
to  point  out  the  dangers  in  the  use  of  these  models  when 
the  assumptions  are  not  satisfied.  It  should,  however,  be 
noted  that  some  models  could  be  robust  to  departures  from 
many  of  these  assumptions  and  can  be  used  for  reliability 
assessment  purposes. 

1.  All  the  models  described  above  assume  that  any  error 
detected  is  immediately  corrected.  The  correction 
process  does  not  alter  the  program.  All  corrections 
remove  the  detected  error  (except  the  IDM  model)  and 
do  not  result  in  the  introduction  of  new  errors.  It 
is  not  hard  to  accept  that  correction  of  a  detected 
error  in  a  program  may  result  in  new  errors  in  the 
program.  Goel  and  Okumoto  [GOE79]  tried  to  address 
the  second  limitation  above  by  formulating  the  Im¬ 
perfect  Debugging  Method  (IDM) .  IDM  assumes 
that  the  number  of  errors  in  the  system  at  time  t  is 
governed  by  a  Markov  process.  Time  between  transitions 
is  exponentially  distributed  with  rates  dependent  on 
the  current  error  content  of  the  program.  The  state 
transitions  are  governed  by  the  probability  of  imperfect 
debugging.  No  one  has  yet  addressed  the  problem  in  which 
the  debugging  process  introduces  new  errors  into  the 


software . 


Models  such  as  those  by  Jelinski  and  Moranda,  Musa, 
and  Shooman  assume  that  the  software  failure  rate 


i  is  a  constant  multiple  of  the  number  of  remaining 

errors.  This  is  the  same  as  saying  that  each  error 
in  a  given  time  interval  (between  failures)  has  the 
|  same  chance  of  being  detected.  This,  obviously,  is 

not  always  true  since  errors  that  happen  to  reside 
in  a  portion  of  the  code  that  is  frequently  executed 
|  by  the  user  (or  tested  by  the  user)  have  a  higher 

probability  of  being  detected.  Errors  which  reside 
in  the  unreachable  (or  never  used)  portion  of  the 
I  code  will  obviously  have  a  lower  (or  zero)  probability 

of  being  detected.  Moranda  tried  to  address  this 
problem  by  reformulating  the  De-Eutrophication  model 
j  into  the  Geometric  De-Eutrophication  Model  and  later 

to  the  Geometric  Poisson  Model.  In  these  variations, 
the  failure  rate  between  adjacent  failure  intervals 
|  is  geometrically  varying. 

The  NHPP  model  by  Goel  and  Okumoto  also  tried  to 
alleviate  these  problems  (i.e.  problems  with  the  con- 
I  stant  failure  rate  models)  by  postulating  a  time  de¬ 

pendent  error  detection  rate  model.  Littlewood  is  more 
ambitious  in  trying  to  address  this  problem.  He  postu- 
I  lated  a  model  which  assumes  that  each  remaining  error 

in  the  program  has  a  different  rate  of  occurrence.  The 
failure  rate  of  the  overall  program  is  then  just  the  sum 
I  of  the  individual  error's  rate  of  occurrence. 
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3.  The  Schick-Wolverton  models  happen  to  model  a  process 
where  there  is  an  increasing  failure  rate  between  fail¬ 
ures.  This  may  be  a  ridiculous  assumption  if  we  argue 
that  software  does  not  wear  out.  But  there  can  be 
cases  where  the  software  failure  rate  might  in  fact  in¬ 
crease  and  this  may  be  attributed  to  the  increased 
intensity  of  testing.  This  phenomenon  is  usually  ob¬ 
served  during  the  early  stages  of  the  software  develop¬ 
ment  cycle. 

4.  Basing  the  time  between  failures  in  terms  of  execution 
(CPU)  time,  as  was  assumed  by  Musa,  Littlewood  and 
Wagoner,  may  sometimes  be  unrealistic.  An  increase  in 
accumulated  time  between  two  adjacent  failures  may  not 
necessarily  mean  that  the  software  has  less  and  less 
number  of  errors  or,  putting  it  equivalently,  that  the 
software's  reliability  is  improving.  A  very  simple 
example  will  illustrate  this  point.  Consider  a  program 
containing  only  a  single  error.  The  same  copy  of  the 
program  is  given  to  two  debuggers.  One  debugger  spends 
a  lot  of  time  running  and  re-running  the  program  (which 
can  be  very  tempting  to  do  on  on-line  and  timesharing 
systems)  trying  to  uncover  the  error.  The  second  de¬ 
bugger,  on  the  other  hand,  spends  a  lot  of  time  analyzing 
the  program  before  even  attempting  to  make  a  test  run. 
Suppose  both  debuggers  are  successful  in  finding  the 
error.  What  is  the  resulting  reliability  of  the  software 
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Execution  time  theory  says  that  since  the  CPU  time 
between  failures  of  the  first  software  is  larger  than 
that  of  the  second  software,  then  the  first  software 
is  more  reliable.  Of  course  we  know  that  this  is  not 
true  since  both  software  have  the  same  reliability. 

Another  example  where  execution  time  may  be  misleading 
is  when  a  selected  subset  of  the  program  is  executed 
repeatedly.  While  the  execution  time  is  accumulating 
the  test  coverage  is  not  and  this  will  lead  to  an 
incorrect  assessment  of  reliability.  What  is  the 
most  appropriate  time  unit  to  use  for  interfailure 
times  is  still  a  controversal  topic. 

What  about  the  assumption  of  independence  of  inter¬ 
failure  times?  Is  this  a  realistic  assumption?  Chances 
are  it  is  not.  The  testing  process  that  is  used  to  un¬ 
cover  errors  is  usually  not  a  random  process.  The  time 
to  the  next  failure  may  very  well  depend  on  the  nature 
and  time  to  failure  of  the  previous  error.  If  the  pre¬ 
vious  error  was  a  very  critical  one,  then  we  might  decide 
to  intensify  the  testing  process  and  look  for  more  critical 
errors.  This  intensification  in  the  testing  process  may 
mean  a  shorter  time  to  the  next  failure  than  what  might 
have  happened  if  the  testing  intensity  were  maintained  at 
normal  levels. 


Most  of  the  models  require  time  between  failure  data  to 
estimate  reliability.  There  can  be  cases  when  the  mean 
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time  between  failure  is  infinite;  as  such,  these 
models  become  useless.  The  mean  time  between  failures 
can  be  infinite  if  the  user  of  the  software  has  re¬ 
quirements  that  would  only  traverse  the  error  free 
paths  of  the  program. 

Basing  the  reliability  of  the  software  on  the  remaining 
number  of  errors  can  also  sometimes  be  ridiculous.  A 
user  does  not  really  care  whether  a  software  has  a 
certain  number  of  remaining  errors.  As  long  as  all 
his  requirements  are  met  correctly  by  the  software, 
then  as  far  as  the  user  is  concerned,  the  software  is 
100  percent  reliable.  Littlewood  [LIT80]  argued  that 
a  program  with  two  bugs  in  little  exercised  portions 
of  the  code  can  be  more  reliable  than  a  program  with 
only  one  but  frequently  encountered  bug. 

All  the  models  implicitly  assume  that  the  testing 
process,  which  generated  the  estimate  for  the  failure 
rate,  will  be  the  same  as  the  operating  environment. 
This  again  is  not  true.  A  reliability  measure 
conditioned  on  the  user  requirements  rather  than  a 
simple  unconditioned  software  reliability  measure 
would  be  more  realistic. 

Some  models  assume  that  software  reliability  is  time- 
dependent.  Most  software  fail  not  because  of  the 
length  of  time  it  has  been  in  use  but  fail  because 
of  the  nature  of  the  input  to  which  it  is  subjected. 
Some  software  like  real  time  control  software  or 


operating  systems  show  an  illusion  of  failing  with 
time-of-use  because  they  are  used  almost  continuously. 
In  those  environments,  the  time-dependence  assumption 
may  be  valid. 

Perhaps  the  most  fundamental  assumption  is  the  treat¬ 
ment  of  the  software  as  a  black  box.  At  least  some 
software  reliability  models  should  take  into  consider¬ 
ation  software  characteristics  and  the  characteristics 
of  the  software  development  process  in  addition  to 
the  failure  times  and  the  number  of  remaining  errors. 


1.5.2  Combinatorial  or  Error-Seeding  Models 


A  number  of  combinatorial  models  have  been  proposed 
but  the  most  popular  (and  most  basic)  is  Mill's  Hypergeo¬ 
metric  Model.  This  model  requires  that  a  number  of  known 
errors  be  randomly  inserted  (seeded)  in  the  program  to  be 
tested.  The  program  is  then  tested  for  some  amount  of 
time.  The  number  of  original  indigenous  errors  can  be 
estimated  from  the  numbers  of  indigenous  and  seeded  errors 

uncovered  during  the  test  by  using  the  hypergeometric  dis¬ 
tribution  . 

Let 

n^  =  number  of  seeded  errors 

K  =  number  of  seeded  errors  detected  during  testing 
N  =  total  number  of  indigenous  errors 

r  *  number  of  indigenous  errors  detected  during  testing 
We  then  have 


P[K  seeded  errors  in  r  detected 
indigenous  errors] 

MLE  for  N  = 


0 


) 


A  variant  of  the  above  model  is  the  so-called  Binomial 
model.  Let  q^  =  Prob  [errors]  on  each  run  i,  then  we  have 

Prob[x  errors  in  y  trials]  =  (pq*  )(l-qi)^. 

The  serious  assumption  of  the  above  models  is  that 
the  indigenous  and  seeded  errors  are  assumed  to  have  the 
same  probability  of  being  detected.  In  other  words,  the 
seeded  errors  must  be  of  the  same  type  and  should  have 


the  same  distribution  as  the  indigenous  errors.  This, 
of  course,  is  difficult  to  meet  in  real-world  conditions. 


A  suggestion  is  given  in  [H078]  to  overcome  this 
problem.  In  this  improved  approach,  two  teams  are  going 
to  test  the  program  independently.  Suppose  team  1  detects 
n  errors  and  team  2  detects  r  errors,  and  the  number 
of  errors  common  to  both  teams  is  K.  We  can  then  view 
the  errors  detected  by  one  team,  say  team  1,  as  seeded 
errors,  and  estimate  the  total  number  of  indigenous  errors 
N  to  be  nr/K,  Note,  however,  that  simple  errors  may  be 
discovered  first  and  the  distribution  of  errors  detected 
may  not  resemble  the  actual  distribution  of  errors;  so 
that  the  estimates  may  be  biased. 

The  advantage  that  is  obvious  with  these  combinatorial 
models  over  the  failure-rate  based  models  is  that  they  are 
based  on  less  and  much  simpler  assumptions. 

1.5.3  Input  Domain  Based  Models 

A  good  representative  set  of  models  in  this  group  in¬ 
cludes  the  Nelson  (TRW)  model  [BR075],  Ho  Model  [H078], 
and  the  Bastani  model  [BAS80]. 

Nelson  (TRW)  Model 

The  reliability  of  the  software  is  measured  by  exposin 
(running)  the  software  with  a  sample  of  n  inputs.  The  n 
inputs  are  randomly  chosen  from  the  input  domain  set 
E  *  (E^:  i  =  1?N)  where  each  is  the  set  of  data  values 
needed  to  make  a  run.  The  random  sampling  of  n  inputs  is 
done  according  to  a  probability  distribution  P. ;  the  set 
(P. :  i  *  1,N)  is  the  "operational  profile"  or  simply  user 


input  distribution.  Ifneis  the  number  of  inputs  that 
resulted  in  execution  failure,  then  an  unbiased  estimate 

A 

for  the  software  reliability  R  is  1  -  (ne/n) .  However, 
it  may  be  the  case  that  the  test  set  used  during  the 
verification  phase  may  not  be  representative  of  the  ex¬ 
pected  operational  usage.  Brown  and  Lipow  [BR075]  sug- 

A, 

gested  an  alternative  formula  for  R  which  is 

N  f . 

8*1-  l  H1  PtEi) 
i=l  nj  J 

where 


f . 
J 


number  of  runs  sampled  from  input  subdomain 
number  of  failures  observed  out  of  nj  runs. 


The  main  difference  between  Nelson's  R  and  Brown  and 
Lipow' s  R  is  that  the  former  explicitly  incorporates  the 
usage  distribution  or  the  test  case  distribution  while  the 
latter  implicitly  assumes  that  the  accomplished  testing 
is  representative  of  the  expected  usage  distribution. 

Both  models  assume  prior  knowledge  of  the  operational 
usage  distribution.  This  may  not  be  easy  to  do  for  some 
real-world  software.  Another  criticism  of  this  approach 
is  the  use  of  random  testing. 


Ho  Model 

Reliability  estimation  in  this  model  proceeds  by 
first  generating  the  symbolic  execution  tree  of  the  pro¬ 
gram.  This  tree  characterizes  all  the  execution  paths 
and  their  associated  outputs  in  the  program.  The  nodes 
represent  statements  while  the  edges  represent  the  state 
vector  resulting  from  symbolic  execution  along  the  path 


1-39 


from  the  root  statement  to  the  current  statement.  A 
procedure  for  generating  the  symbolic  execution  tree 
is  given  in  [H078 ] : 

I.  The  first  statement  is  the  root  of  the  tree. 

II.  If  a  leaf  is  not  a  STOP  or  RETURN  statement, 

symbolically  execute  the  statement  corresponding 
to  the  node.  If  the  current  statement  is  a 
conditional  statement,  the  feasibility  of  the 
branches  is  examined.  New  nodes  are  created 
for  statements  which  are  successors  of  the 
current  statement.  Edges,  labelled  with  state 
vectors  are  joined  between  the  current  node 
and  the  new  nodeCs). 

III.  Go  to  II. 

The  generated  execution  paths  from  the  symbolic  exe¬ 
cution  tree  are  proven  correct  or  are  sample  tested.  For 
a  given  path,  say  path  i,  if  it  is  proven  correct,  then 
the  path  reliability  R^  =  1,  If  path  i  cannot  be  proven 
correct,  a  random  sample  of  N  test  cases  is  generated 
that  will  execute  path  i.  If  no  failures  result  from 
the  execution  of  the  N  test  cases,  then  R^  is  bounded 
below  by  1  -  where  is  the  confidence  interval  of 
path  i.  The  length  of  is  a  function  of  our  given 
confidence  coefficient  a.  On  the  other  hand,  if  n 
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failures  are  observed  and  the  errors  not  corrected,  then 

N  *  n 

is  bounded  below  by  — ■  -  C^.  If  the  observed  n 

failures  are  corrected,  then  the  sample  testing  is  re¬ 
peated  for  path  i. 

Finally,  the  software  reliability  estimate  R  is 
calculated  from 

m 

R  ■  & 

where : 

f.  =  weighting  factor  or  path  i  which  corre¬ 
sponds  to  the  execution  frequency  of  path  i. 
m  =  total  number  of  execution  paths. 

One  difficulty  with  applying  this  approach  is  the 
large  number  of  paths  that  may  exist  for  real  world  soft¬ 
ware  , 

Bastani  Model 

This  input  domain  based  model  estimates  the  reliability 
R  from  the  relation 

ft  =  1  -  Vo 
er 

where : 

A 

V  =  the  total  error  size  remaining  in  the  program. 
er 

V  can  be  determined  by  testing  the  program  and  locating  and 
r 

estimating  the  size  of  errors  found  [BAS80],  An  error 


has  a  large  size  if  it  is  easily  detected  (i.e.,  if  it 
affects  many  input  elements).  An  error  has  a  small  size 
if  it  is  relatively  difficult  to  detect.  The  size  of  an 
error  depends  on  the  way  test  inputs  are  selected.  Good 
test  case  selection  strategies  like  path  testing,  boundary 
value  analysis,  magnify  the  size  of  an  error  since  they 
exercise  error-prone  constructs.  The  observed  error 
size  is  lower  if  random  testing  is  employed.  Although 
the  model  does  not  assume  random  testing  (in  fact,  any 
test  strategy  can  be  employed),  it  offers  no  easy  or  sys¬ 
tematic  way  to  estimate  V 


SECTION  2 


A  TIME  DEPENDENT  FAULT  DETECTION  RATE  MODEL 

2 . 1  INTRODUCTION 

In  this  section,  our  objective  is  to  develop  a 
parsimonious  model  whose  parameters  have  a  physical 
interpretation,  and  which  can  be  used  to  predict  vari¬ 
ous  quantitative  measures  for  software  performance 
assessment.  Also  of  interest  is  the  applicability  of 
the  model  over  a  broad  class  of  projects.  Further,  it 
should  be  possible  to  estimate  the  parameters  of  the 
model  from  available  failure  data  which  could  be  given 
as  either  the  number  of  failures  in  specified  time  in¬ 
tervals,  or  as  times  between  software  failures. 

With  this  objective,  we  develop  and  investigate 
a  nonhomogeneous  Poisson  process  (NHPP)  [BR072]  model 
with  a  time  dependent  fault  detection  rate  for  the  soft¬ 
ware  failure  phenomenon.  By  studying  the  behavior  of 
the  cumulative  number  of  failures  by  time  t  process,  N(t) 
it  is  shown  in  section  2.2  that  this  process  can  be  well 
described  by  a  non-homogeneous  Poisson  process  (NHPP) 
with  a  two  parameter  exponentially  decaying  fault  de¬ 
tection  rate. 
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NHPP  has  been  used  by  many  researchers  to  describe 
random  phenomena  in  various  applications  [CR074,  DUN75, 
DVA64].  Some  such  applications  are  the  occurrences  of 
coal  mining  disasters  [MAG52];  equipment  failures  [DUA64, 
LEW64,  PR063];  transactions  in  a  data-base  system  [LEW76] ; 
and  software  error  counts  over  a  series  of  time  intervals 
[SCH75] .  Various  forms  of  the  intensity  function  for 
the  NHPP  used  in  actual  applications  are  the  exponential 
polynomial  rate  function  lI.EW76]  ,  a  log-linear  rate 
function  [COX66] ,  and  a  Weibull  rate  function  [CR074, 

DON75 ,  MOE76 ] . 

Several  measures  for  software  performance  assess¬ 
ment,  such  as  the  number  of  faults  remaining  in  the  sys¬ 
tem,  distribution  of  time  to  next  failure,  and  software 
reliability,  are  proposed  in  section  2.3.  Based  on  the 
NHPP  model,  expressions  are  then  derived  for  obtaining 
the  estimates  and  confidence  limits  for  these  perfor¬ 
mance  measures . 

Two  methods  are  described  in  section  2.4  for  esti¬ 
mating  the  parameters  of  the  model  from  available  fail¬ 
ure  data.  The  first  one  is  for  the  case  when  data  is 
given  in  the  form  of  number  of  failures  in  given  time 
intervals.  The  time  intervals  can  be  of  equal  or  un- 
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2 . 2  MODEL  DEVELOPMENT 


A  software  system  in  use  is  subject  to  failures 
caused  by  faults  present  in  the  system.  The  faults  are 
encountered  when  a  sequence  of  instructions  is  executed 
which,  in  turn,  depends  on  the  input  data  set.  In  this 
section,  we  develop  a  model  to  describe  this  failure 
occurrence  phenomenon. 

2.2.1  Deterministic  Analysis  of  Software  Failure  Process 

It  is  useful  to  first  make  a  simpler  analysis  by  ig¬ 
noring  the  statistical  fluctuations  in  the  number  of  soft 
ware  failures  before  analyzing  the  failure  phenomenon  as 
a  stochastic  process  [COX65] .  Let  n(t)  denote  the  cumu¬ 
lative  number  of  software  failures  detected  by  time  t. 
Assume  that  n(t)  is  large  enough  so  that  is  can  be  ex¬ 
pressed  as  a  continuous  function  of  t.  Since  the  number 
of  errors  in  a  system  is  a  finite  value,  n(t)  is  a  bound¬ 
ed  non-decreasing  function  of  t  with 


For  purposes  of  modeling,  we  assume  that  the  usage  of 
the  system  is  basically  similar  over  time.  Then  the 
number  of  failures  in  (t,  t+At)  is  proportional  to  the 
number  of  undetected  faults  at  t,  i.e., 

n (t+At)  -  n (t)  =  b{a-n (t) }At  ,  (2.2) 


where  b  is  a  proportionality  constant. 

A  graphical  representation  of  the  above  description 
is  provided  in  Figure  2.1. 

Now,  from  Equation  (2.2),  we  get  the  differential 
equation 

n'  (t)  =  ab  -  bn (t)  .  (2.3) 

Taking  the  Laplace  transform  [ABR65 ,  BUC56]  of  Equa¬ 
tion  (2.3)  under  the  conditions  of  Equation  (2.1),  we 
have 


sn (s)  =  ~  -  bn  (s ) 


or 


n  (s) 


ab 

s(s+b)  ' 


(2.4) 
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t  At  t+At 


FIGURE  2.1.  A  Graphical  Representation  of  the 
Deterministic  Model  for  Software 
Failures . 


where 


n  (s)  =  /0e"St  •  dn(t)  .  (2.5) 

The  solution  of  Equation  (2.3)  is  thus  obtained  by  in¬ 
verting  Equation  (2.4)  and  is  given  by 

n (t)  =  a  (1  -  e~bt)  .  (2.6) 

Under  the  assumptions  discussed  above.  Equation 
(2.6)  is  the  deterministic  model  of  the  software  fail¬ 
ure  process.  For  given  a  and  b,  we  can  easily  compute 
the  number  of  failures  to  be  encountered  by  some  time  t 
so  that  the  failure  phenomenon  can  be  described  with 
certainty.  It  should  be  noted,  however,  that  the  actual 
failure  phenomenon  is  not  deterministic. 

2.2.2  Stochastic  Analysis  of  Software  Failure  Process 

In  an  actual  usage,  the  software  system  is  subject¬ 
ed  to  random  inputs  causing  the  failures  to  occur  at 
random  times,  i.e.,  the  failure  phenomenon  is  stochastic 
(non-deterministic) .  Therefore,  a  realistic  description 
of  the  failure  process  must  incorporate  this  randomness. 
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Let  (N  ( t)  ,  t  >_  0)  be  a  counting  process  [PYK61, 
ROS76,  SNY75]  representing  the  cumulative  number  of 
failures  by  time  t.  (Note  that  N(t)  is  a  random  vari¬ 
able  while  n(t)  above  was  taken  to  be  deterministic.) 
Assuming  that  each  failure  is  caused  by  one  fault,  N(t) 
also  represents  the  cumulative  number  of  faults  detect¬ 
ed  by  time  t.  It  should  be  pointed  out  that  a  detected 
fault  may  not  be  removed  and,  as  a  result,  may  cause 
additional  failure (s)  at  a  later  stage.  For  the  N(t) 
process,  such  recurrences  are  counted  as  new  events. 

Let  m(t)  be  the  mean  value  function  of  the  N(t) 
process,  i.e., 


m (t)  =  E[N(t)J  .  (2.7) 

Since  m(t)  represents  the  expected  number  of  software 
failures  or  detected  faults  by  time  t,  it  is  a  non-de¬ 
creasing  function  of  t.  If  we  assume  that  there  will 
be  a  finite  number  of  faults  to  be  detected  over  a  long 
period  of  time,  m(t)  has  the  following  boundary  condi¬ 


tions  : 


where  a  <  »  and  represents  the  expected  number  of  soft¬ 
ware  faults  to  be  eventually  detected.  Furthermore,  it 
is  assumed  that,  for  small  At,  the  expected  number  of 
software  failures  during  (t,  t+At)  is  proportional  to 
the  expected  number  of  software  faults  undetected  by 
time  t,  i.e.. 


m(t+At)  -  m(t)  =  b{a-m(t)}At  ,  (2.9) 

where  b  is  a  constant  of  proportionality.  Solving  the 
differential  equation  obtained  from  Equation  (2.9)  under 
the  boundary  conditions  of  Equation  (2.8),  we  get 

m(t)  =  a  (1  -  e"bt)  .  (2.10) 

This  equation  specifies  the  mean  value  function  for  the 
underlying  software  failure  counting  process  N(t).  The 
intensity  function,  obtained  by  taking  the  derivative 
of  m(t) ,  represents  the  fault  detection  rate  at  time  t 
and  is  given  by 

X (t)  =  m' (t)  =  abe_bt  .  (2.11) 
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We  now  study  the  probabilistic  behavior  of  the 
N(t)  process  by  using  m(t)  and  X  (t) .  Since  there  are 
no  failures  at  t  =  0,  we  have  N(0)  =0.  It  is  also 
reasonable  to  assume  that  the  number  of  software  fail¬ 
ures  during  non-overlapping  time  intervals  are  indepen 
dent.  In  other  words,  for  any  finite  collection  of 
times  t^  <  t2  <  . . .  <  tfi,  the  n  random  variables  Nft^ 
{N (t2) -N (t^) {N (tn) -N (t  }  are  statistically  in 
dependent.  This  implies  that  the  counting  process 
{ N  (t )  ,  t  ■>  0}  has  independent  increments. 

We  assign  the  probabilities  on  the  increments  of 
the  M (t )  process  as  follows. 

SO  with  probability  1-A (t ) At+O ( At ) 

1  with  probability  A (t) At+O (At) 

2  with  probability  O(At)  (2,1' 

* 


where 


The  underlying  N(t)  process  satisfying  conditions  of 
Equation  (2.12)  is  now  a  NHPP  with  mean  value  function 
m(t)  and  intensity  function  X (t)  as  given  in  Equations 
(2.10)  and  (2.11),  respectively  [FELW57 ,  FELW60 ] . 

Hence,  the  distribution  of  N(t)  is  given  by 

P(N(t)  -  y }=  {  -y! }Y  e"m(t) ,  y  =  0,1,2,...  (2.13) 

Under  the  assumptions  discussed  above,  the  stochastic 
behavior  of  the  software  failure  phenomenon  can  be  com¬ 
pletely  described  by  Equation  (2.13).  It  should  be 
pointed  out  that  Equation  (2.9)  implies  that  the  ratio 

Number  of  faults  detected  during  (t,  t+At)  _  b  ,2 
(Number  of  faults  undetected  by  t)At 

is  constant  at  any  time  t.  Therefore,  b  can  be  inter¬ 
preted  as  the  error  detection  rate  per  error. 

Equations  (2.10)  and  (2.13)  constitute  the  basic 
software  failure  model  under  study  in  this  report. 


14) 
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2.3  SOFTWARE  PERFORMANCE  MEASURES 

The  model  developed  in  section  2.2  is  a  descrip¬ 
tion  of  the  failure  phenomenon.  In  order  to  use  this 
model  to  predict  software  performance,  we  generally 
need  expressions  for  quantitative  measures,  such  as  the 
number  of  failures  by  some  prespecified  time,  the  number 
of  faults  remaining  in  the  software  at  a  future  time, 
and  software  reliability  during  a  mission.  In  this  sec¬ 
tion,  we  develop  models  that  can  be  employed  to  estimate 
such  quantities. 

2.3.1  Number  of  Software  Faults  Detected  by  t 

For  given  a  and  b,  the  distribution  of  N(t),  the 
cumulative  number  of  software  failures  detected  by  time 
t,  is  obtained  from  Equations  (2.10)  and  (2.13)  as 

P{N(t)=y}  =  - —  e-a  1  e  ) , 

Y  • 

y  =  0,1,2,...  .  (2.15) 

In  other  words,  N(t)  has  a  Poisson  distribution  with 
mean 

m (t)  =  E [N (t ) ]  =  a  (1  -  e~bt)  . 


(2.16 


Note  that 


ay  -a 

P{N  (») =y }  =  gr  e  ,  y  =  0,1,2,...,  (2.17) 

J  • 

i.e.,  the  distribution  of  N(°°),  the  total  number  of 
failures  encountered  or  faults  detected  if  the  system 
is  used  indefinitely,  is  also  a  Poisson  distribution 
with  mean  'a1.  This  result  is  consistent  with  theore¬ 
tical  studies  which  indicate  that  the  Poisson  process 
is  the  limiting  distribution  of  many  phenomena  similar 
to  the  software  error  occurrence  phenomenon  [MIL76, 

SNY  75]  . 


2.3.2  Number  of  Remaining  Faults 

We  have  been  considering  the  number  of  failures  en¬ 
countered  by  time  t,  N(t).  Since  many  of  the  performance 
measures  depend  on  the  number  of  faults  remaining  in  the 
system,  we  now  consider  this  phenomenon. 

Let  N (t )  be  the  number  of  faults  remaining  in  the 


system  at  time  t,  i.e., 


The  expectation  of  N(t)  is  given  by 


E[N(t)  ] 


ae 


-bt 


(2.19) 


2.3.3  Conditional  Distribution  and  Expectation  of  N(t) 

If  we  have  already  observed  y  faults,  it  is  useful 
to  know  the  distribution  of  the  number  of  faults  yet  to 
be  detected.  In  other  words,  the  conditional  distribu¬ 
tion  of  N  ( t)  ,  given  that  N(t)  =  y,  is 

P{N(t)-x|N(t)-y)  =  P{^(pf%j'=yr=y}  *  (2.20) 

Now  the  event  N(t)  =  x  denotes  occurrences  over  the  time 
interval  (t,°°)  while  the  event  N(t)  =  y  denotes  occur¬ 
rences  over  the  interval  (0,t),  i.e.,  these  two  events 
represent  non-overlapping  time  intervals.  From  a  basic 
property  of  the  NHPP  process,  such  events  are  independent 
of  each  other,  so  that  we  have 


P{N(t)=x|N(t)=y}  =  P{ N ( t ) =x ) ,  x  =  0,1,2,... 


(2.21) 


P(N(“)-N(t)=x|N{t)=y}  =  W.)-m(t)}*  e-tn(.)-m(t)) 

Or,  substituting  for  m(°°)  and  m(t)  from  Equation  (2.10), 
we  get 


P{N(=)-N ( t) =x| N (t) =y }  =  ta-a (l-e~bt) }x  e~l a-a t l-e~bt ) I 


This  yields 


P{N (t) =x | N (t ) =y }  = 


{ae-bt }x  -ae"bt 
- xl -  e 


(2.22) 


Finally,  the  expected  number  of  faults  to  be  detected, 
given  N(t)  *  y,  is 

E [N (t) | N (t) =yj  =  ae“bt  .  (2.23) 


This  conditional  distribution  is  important  for  deciding 
whether  the  software  system  under  development  can  be 
released  or  not.  The  decision  should  be  made  based  on 
the  number  of  faults  remaining  in  the  software  because 
this  quantity  plays  an  important  role  in  software  reli¬ 
ability  assessment.  Suppose  that  the  decision-maker 
conducts  an  experiment  and  finds  y  software  faults  by 
time  t.  Then,  a  decision  might  be  to 
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ul 


Accept  if  N(t)  <_  nr 


Reject  if  N(t)  >  nQ  , 


where  is  some  specified  number.  For  this  decision 
rule,  the  probability  that  the  software  system  is  ac¬ 
cepted  for  a  given  number  of  failures  y  by  time  t  is 


P {Accept}  =  P{ N (t )  i  nn | N (t)  =  y} 


and,  using  Equation  (2.22),  becomes 


P{ Accept}  =  E  P[N(t)=i|N(t)=y]  . 
i=0 


(2.24) 


The  conditional  expectation  of  N(t),  given  N(t)=y, 


is  given  by 


E[N(t) | N (t) =y]  =  E [N (t) ] 


E [N (t) i N (t)=y ]  =  ae' 


(2.25) 
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Therefore,  the  expected  number  of  faults  remaining  in 
the  software  system  at  time  t,  given  that  y  errors  have 
been  detected  during  the  testing  period  t,  is  simply 
the  expected  number  of  faults  to  be  detected  during 
[t,°°] . 
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2.4  SOFTWARE  RELIABILITY  AND  DISTRIBUTION  OF  TIME 
.  BETWEEN  FAILURES 


2.4.1  Software  Reliability 

Let  a  sequence  of  random  variables  {X^,  i  =  1,2,...} 
denote  a  sequence  of  times  between  software  failures 
associated  with  the  N(t)  process.  Then  denotes  the 
time  between  the  (i-l)st  and  the  ith  failures.  We  also 
define 


n 

S  =  E  X. ,  n  -  1,2, . . .  (2.26) 

n  i=l 

which  represents  the  time  to  the  nth  failure.  Let  ^(x) 
be  the  Cumulative  Distribution  Function  (cdf)  of 
i.e. , 

$x^(x)  5  P{XL  <  x}  .  (2.27) 

Note  that  the  event  (X^^  >  x}  implies  that  there  are  no 
failures  during  (0,x],  i.e.,  the  event  (N(x)  =  0}.  Then, 
using  Equation  (2.15),  the  reliability  function  associ¬ 
ated  with  the  first  failure  time  is  given  by 

Rx^(x)  =  P{XX  >  x}  =  P ( N (x)  =  0} 
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„  ,,  -bx. 
Ry  (x)  =  e“a(1  ■  e  ) 
X1 


Now,  the  cdf  of  X^  can  be  written  as 


(2.2C) 


$x  (x)  =  1  -  Rj^(x) 


$  (x)  =  1  -  e 

X1 


-all  -  e‘bx) 


(2.29) 


The  Probability  Density  Function  (pdf)  is  defined  as 


\(x)  *  35  \(x) 


so  that 


(x)  -  abe-^e-3'1  *  e‘bX> 


(2.30) 


Next,  consider  the  conditional  probability  distribution, 

$X  | x  (xls)'  of  (Xjlx^J.  The  event  (X2  >  x | X1  =  s}  im- 
2  1 

plies  {no  failures  in  (s,s+x]>.  Then  the  conditional 
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r 


r  i 


>  '  "# 


« 


f 


reliability  function  of  the  second  failure,  given  that 
the  first  failure  occurs  at  time  s,  is  given  by 


P{X2  >  x|xL  =  s} 


=  P{no  failures  in  (s,s+x]} 


=  P{ N  (s+x)  -  N (x)  =  0} 

_  - [m (s+x)  -  m(s)  ] 

.  fi-a  [e-bs  -  e-b(s+lt)] 


(2.3 


From  Equation  (2.31),  we  obtain 


<l,x2|x1(x^s)  =  1  rx2|x1(x's) 


=  1  -  e 


-ale'bs  -  e-b(s+x))  (2.3 


and 


*X2 \X1 (XIS)  H  dx  $X2 |X1 (XIS) 


....  ,  -bs  -b (s+x) , 

-b  (s+x)  -a{e  -e  } 


Combining  Equations  (2.29)  and  (2.33),  we  get  the  joint 
density  of  X.^  and  X2  as 


>X1,X2 (xl'x2}  ^X2 \x1 (x2 (xl) 

-fax.  -b(x. +x~) 

=  (abe  l)  (abe  *  ) 

.  -bx.  -b (x. +x~) 

_bxl  -afe  1-e  1  2  } 

.  -a  (1-e  1)e  aie  e  } 


2  -bx.  -b(x.+x2)  -afl-e‘b{Xl+X2)} 
,X1,X2(x1'x2)  =  a2fa  e  le  e  au  e  (2.34) 


Making  the  transformation  s^  =  x^,  s2  =  x1+x2,  the  joint 
density  of  and  S2  is 

-  ,  -bs.  -bs,  n  bs2. 

<I>S1,S2(s1'S2)  "a  b  e  e  ^e"aU_e  (2.35) 


In  general,  it  can  be  shown  that  the  conditional 
reliability  function  of  X^,  given  =  s,  is  given  by 


Rv  ic  (x I s )  —  e 
xk|Sk-l 


-a{e“bs  -  e_b(s+x) 


(2.36) 
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2.4.2  Conditional  Distribution  of 


The  conditional  cdf  and  pdf  are  obtained  from  Equa¬ 
tion  (2.36)  by  recalling  that  R(x)  =  1  -  $ (x)  and  $ (x)  = 
$ (x) .  Thus,  we  have 


XklSk-l 


(x  |  s)  =  1  -  e 


-a{e 


-bs 


_  e-b (s+x)  } 


(2.37) 


and 


k-1 


(xls)  = 


abe-b(s+x)a-a(e'bS-e'b,8+X)>  (2.38) 


respectively . 

As  can  be  seen  from  the  above  equations,  the  time 
to  the  next  failure  depends  on  the  time  when  the  last 
failure  occurs.  It  should  be  noted  that  the  distribu¬ 
tions  of  times  between  failures  are  improper,  i.e., 

-bs 

|  (-|~)  =  1  -  e~ae  <  1  .  (2.39) 

xk '  k-1 


This  is  due  to  the  fact  that  the  event  (no  failures  in 
(0,-=°]}  is  allowed  in  our  model.  Hence,  the  expectations 
of  these  quantities  do  not  exist. 
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2.4.3  Joint  Density  of  Waiting  Times 

As  defined  above,  {X^,  k  =  1,2,...}  denotes  the  se¬ 
quence  of  times  between  software  failures.  Then 

n 

S  =  EX.,  n  =  1,2,... 
n  i-1  1 

is  called  the  waiting  time  to  the  nth  software  failure. 

This  quantity  is  quite  important  for  estimation  of  para¬ 
meters  a  and  b  and,  hence,  we  obtain  the  distribution 
of  {S1,S2, . . . ,Sn> .  The  distribution  is  obtained  by  using 
an  approach  similar  to  that  used  for  getting  Equation 
(2.34).  The  result  is  summarized  in  the  following  theorem. 

Theorem.  The  joint  probability  density  of  S^,S2,...,Sn 
is  given  by 

n 

-b  E  s^  -bsn 

,  \  #  .  \  n  i  =  l  — a  ( 1— e  ) 

<(>c  c  (s.,...,s  )  =  (ab)  •  e  •  e  (2.40) 

°n  A  n 

where  s]/s2'‘**'sn  ^enote  t*ie  realizations  of  S^,S2,..., 

Sn,  respectively. 

The  density  can  also  be  written  as 

-m(s  )  n 

♦  „  c  (s.  ,...,  s  )  =  e  n  A(s.)  (2.41) 

I'*..'  n  k=l 
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d  k 

where  X  (s^)  =  (m(sk)}  and  m(sk)  =  a(l-e  ).  For 

k 

a  proof  of  this  theorem,  see  [COX66]  and  [DON75] . 

Equation  (2.40)  will  be  used  later  to  estimate  a 
and  b  based  on  observed  data  s_  =  (s^,...,sn). 

2.4.4  Joint  Counting  Probability 

The  property  of  independent  increments,  along  with 
Equations  (2.8)  and  (2.12),  provides  a  complete  statis¬ 
tical  characterization  for  NHPP  so  that  the  joint  count¬ 
ing  probability  can  be  determined  for  any  collection  of 

times  0  <  t,  <  t-  <  . . .  <  t  .  That  is,  with  tn  =  0, 

12  n  '0 

y0  =  o. 

P{N(tx)  =  y1#N(t2)  =  y2,...,N(tn)  =  yn) 

-  -  N<tri_  )  -  yt  - 

1=1 

yi_yi-l 

n  [m  ( t  ■  )  -  m(t.  .  )  ]  -m(t  ) 

=  n  - - - — -  e  .  (2.42) 

i=1  (yi-yi-i)! 


Equation  (2.42)  is  needed  for  estimating  the  parameters 
a  and  b  for  given  data  {  (y^t^)  ,  i  =  l,2,...,n}. 
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2.5  ESTIMATION  OF  MODEL  PARAMETERS  FROM  FAILURE  DATA 

The  basic  models  for  the  failure  process  and  perfor 
mance  measures  were  developed  in  Sections  2.2  and  2.3, 
respectively.  In  order  to  use  these  models  for  software 
performance  assessment,  the  only  parameters  to  be  speci¬ 
fied  are  the  total  expected  number  of  errors  to  be  de¬ 
tected,  a,  and  the  error  detection  rate  per  error,  b. 

In  other  words,  for  given  a  and  b,  various  useful  quan¬ 
tities  can  be  computed  from  the  relevant  equations  in 
sections  2.2  and  2.3. 

In  general,  a  and  b  are  not  known  for  a  specific 
soitware  system  and  are  estimated  from  the  available 
data  generated  during  testing.  However,  that  is  not 
the  only  way  to  get  a  and  b.  One  may  be  willing  to  ex¬ 
trapolate  these  values  based  on  the  data  from  one  or 
more  "similar"  systems.  Another  method  would  be  to  use 
a  Bayesian  approach,  whereby  knowledge  about  a  and  b  can 
be  expressed  as  prior  distributions  and  used  for  perfor¬ 
mance  assessment.  This  approach  can  also  be  used  in 
conjunction  with  available  data  and  is  specially  useful 
when  failure  data  are  scarce  or  expensive  to  collect. 

The  purpose  of  this  section  is  to  describe  methods 
for  estimating  a  and  b  from  failure  data.  Use  of  these 


methods  is  illustrated  later  via  failure  data  from  oper 
ational  systems.  Such  data  are  generally  available  as 
(i)  total  number  of  failures  in  given  time 


intervals;  and/or  as 
(ii)  times  between  failures. 

Most  of  the  available  data  is  given  in  the  form  of 
number  of  failures  in  given  time  intervals;  the  data  on 
times  between  failures  is  very  rare.  Nevertheless,  both 
of  these  cases  are  considered  below. 

2.5.1  Estimation  When  Cumulative  Failures  Are  Given 

We  first  consider  the  case  when  data  are  available 
as  cumulative  number  of  failures  in  given  time  intervals. 
Suppose  y1,y2, • • . ,y  are  the  cumulative  number  of  failures 
detected  by  times  t^ , t2 , . • • , t  ,  respectively.  This  can 
also  be  written  as  data  pairs  {  (y^,t^) ,  i  =  l,2,...,n}. 
Thus,  the  number  of  failures  in  time  interval  (t^_^,t^) 
is  (y^-y^  ,)  f°r  i  -  lf2,3,...,n,  where  tQ  =  0  and  yQ  =  0. 
We  will  obtain  the  Maximum  Likelihood  Estimates  a  and  b 
of  a  and  b,  respectively.  To  do  this,  we  first  write  the 
joint  density  and  obtain  the  likelihood  function,  and 
then  the  log- likelihood  function.  Next,  we  take  the  par¬ 
tial  derivatives  of  the  log-likelihood  function  with  re¬ 
spect  to  a  and  b  and  equate  them  to  zero  for  maximization. 


The  solutions  of  the  resulting  two  equations  are  the  de¬ 
sired  values  (a,b) . 

Now,  to  get  the  joint  density,  we  note  that  in  our 
notations  are  observed  values  of  N(t.), 

Hftj) » • • • »N(t  ) ,  respectively.  Hence,  from  Equation 
(2.42) , 


PWt^ 


yi,N(t2)  =  y2,...#N(tn)  =  yn> 


y  •  -y  • 

n  [m(t.)  -m(ti_.)  ]  1  11  ~{m(ti)  -m(t .  _ 

=  n  -  e  x 

i=i  (y±-yi-i> ! 

(2.43 


-bt± 

where  mlt^  *  a  (1-e  ). 

It  is  well  known  that  the  likelihood  function  for 
the  parameters  is  simply  the  joint  density  of  y^,y2,. .. 
y  ,  with  these  values  considered  as  known  constants. 
Substituting  for  m(t^)  in  Equation  (2.43),  the  likeli¬ 
hood  function  for  (a,b),  given  the  data  (t,£) ,  is 


"bti-l  "bti  yi"yi-l 
(a (e  1  1-e  1) }  1  1  1 

(Yi-Yi-l) • 


L(a,b|^,t)  =  n 
i=l 


-a (1-e  n) 


(2.44 


Taking  the  natural  logarithm  of  Equation  (2.44)  yields: 


UnL (a,b  |  £,  t) 


*  (yi-yi-iHna  +  *  (yi-yi-i} 

1=1  1=1 


.  -bti-l 
m(e  -e 


-bt . 


n 

I 

i=l 


-bt: 

*  *n *yi-yi-l*  1  "  a(1_e  >•  (2.45) 


As  mentioned  above,  the  maximum  likelihood  estimates 
(mle's)  are  those  values  of  a  and  b  which  maximize 
*,nL(a,b|  t,£)  ,  i.e.,  which  satisfy  (for  brevity  we  write 
L  to  denote  L(a,b|t,y_)) 

3  &nL  _  0 


and 


(2.46) 


3  £nL  „ 

3b~~  =  0  * 

By  taking  the  partial  derivatives  of  Equation  (2.45) 
and  equating  them  to  zero,  we  obtain,  after  some  simpli¬ 
fication  (recall  that  =  0) , 


a  (1-e 


y 


(2.47) 


The  solutions  of  the  resulting  two  equations  are  the  de¬ 
sired  values  (a,b) - 

Now,  to  get  the  joint  density,  we  note  that  in  our 
notations  y^,y 2'  *  •  •  '¥n  are  t^le  observed  values  of  N(t^), 
N(t2) , . . . ,N(tn) ,  respectively.  Hence,  from  Equation 
(2.42), 


P(N(t1) 


yi,N(t2)  =  y2 . N(tn)  =  yn} 

/i-yi-i 


n  [m(t.)-m(t.  .)] 
n  - ± - i— =- 


-(m(ti) )  } 


i=i  (yi-yi-1)1 


(2.43) 


-bti 

where  m(ti)  =  a  (1-e  ). 

It  is  well  known  that  the  likelihood  function  for 
the  parameters  is  simply  the  joint  density  of  y^,y2,..., 
y  ,  with  these  values  considered  as  known  constants. 
Substituting  for  m(t^)  in  Equation  (2.43),  the  likeli¬ 
hood  function  for  (a,b),  given  the  data  (t,£) ,  is 


n  fa,/bti-l  ;btinyi‘yi-l 
L(a,b|£,t)  =  n  ta(e  e  -)} 


i=l 


<yi-yi-i)l 

-bt. 


n, 


-a (1-e  “) 


(2.44) 


Taking  the  natural  logarithm  of  Equation  (2.44)  yields: 


4nL (a ,b | £, t) 


E  (Yi-yi.-i  )  *na  + 
i=l 


Z 

i=l 


-bt 

•  Jin  (e 


i-1 


n 

z 

i=l 


-bt 

•  £n(Yi-yi-i>  !  -  ad-e  ”>  •  (2-45 


As  mentioned  above,  the  maximum  likelihood  estimates 
(mle's)  are  those  values  of  a  and  b  which  maximize 
JlnL  (a,b  1 t, %)  ,  i.e.,  which  satisfy  (for  brevity  we  write 
L  to  denote  L(a,b|t,Y.)) 

3  JlnL  _  - 
3a  u 


and 


(2.46 


3£nL 

3b 


By  taking  the  partial  derivatives  of  Equation  (2.45) 
and  equating  them  to  zero,  we  obtain,  after  some  simpli 
fication  (recall  that  yn  =  0) , 


(2.48) 


-bt 

at  e 
n 


n 


r  (yi~yj-l) (tie 

i-1  e'bti-l  .  e'bti 


-bt 


i-1, 


As  can  be  easily  seen,  all  the  quantities  in  Equa¬ 
tions  (2.47)  and  (2.48)  are  known  except  a  and  b,  which 
are  to  be  estimated.  These  equations  do  not  yield  sim¬ 
ple  analytical  forms  and  we  use  numerical  methods  for 
their  solution.  The  resulting  values  of  a  and  b  are 
the  mle's  a  and  b,  respectively. 

It  should  be  pointed  out  that,  even  though  the  mle's 
are  the  desired  values,  it  is  often  useful  to  study  the 
log-likelihood  surface  as  a  function  of  parameters  a  and 
b.  For  given  data,  a  plot  of  the  log-likelihood  surface 
can  be  obtained  by  solving  Equation  (2.45)  for  a  grid  of 
values  of  a  and  b.  If  the  plot  is  flat,  it  woull  indi¬ 
cate  a  large  variability  associated  with  the  mle's  while 
a  sharp  surface  is  an  indicator  of  low  variability.  A 
surface  with  sharp  rises  and  falls  might  cause  problems 
in  numerical  solution  of  Equations  (2.47)  and  (2.48), 
while  a  well-behaved  surface  would  ensure  rapid  conver- 

A  A 

gence  to  the  values  a,  b. 
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2. 5. 1.1  Confidence  Region  for  (a,b) 


In  addition  to  the  mle ' s  a,  b,  we  generally  want 
to  quantify  the  region  in  which  the  true  values  a,  b 
might  lie  with  a  specified  degree  of  confidence.  This 
is  referred  to  as  obtaining  the  100  (l-u)?S  joint  confi¬ 
dence  region  for  (a,b) .  In  general,  it  is  not  possible 
to  get  the  exact  confidence  region  [FIN76]  because  the 
true  distribution  of  (a,b)  is  unknown.  However,  mle ' s 
have  a  very  desirable  property  that  they  are  asymptotic¬ 
ally  normally  distributed,  if  the  sample  size  is  large. 

Also  of  great  usefulness  is  the  invariance  property 
of  the  mle's,  i.e.,  a  function  of  (a,b)  can  be  estimated 
by  using  the  mle's  a,  b  and  this  function  will  also  be  a 
mle.  This  will  be  useful  for  estimating  N(t),  R(t),  etc. 

Formally,  as  indicated  above,  the  mle's  are  normal¬ 
ly  distributed  for  large  n,  i.e. , 


The  variance-covariance  matrix  represents 


Var (a) 


Cov  (a  ,b) 


and  is 


where 


That  is 


E 

COV 


Cov{b,a) 


Var (b) 


) 


given  by 


(2.50) 


r 


ij 


E 


32£nL 

“ITaJ  ' 


i/ j 


a,b  . 


r  =  -E 
aa 


3  2£nL 
3a2 


(2.51) 


rab  =  rba 


-E 


32£nL 

3a3b 


(2.52) 


^  3  2  £nL 


(  o  COX 


•r 


Taking  the  appropriate  partial  derivatives  of 

Equation  (2.45)  and  substituting  in  Equations  (2.51), 

(2.52),  and  (2.53),  we  obtain,  after  some  simplifica- 

-bt. 

tion,  (recall  that  E[N(ti)]  e  m(ti)  =  a(l-e  x) ) : 


r 


aa 


1 

a 


n 

I 

i=l 


-bt 

(e 


i-1 


(2.54) 


rab  rba 


fcne 


(2.55) 


and 


bb 


t  ~bt.  , -bt. 

/ .  i  \  2  i-1  l 

n  (t.  ,-t.)  e 
a  z  —  x  x 


i=l 


“bti-l  -bti 
(e  1  -  e  x) 


(2.56) 


Substituting  these  expressions  in  Equation  (2.50),  we 
get  the  variance-covariance  matrix  for  (a,b) .  Thus, 
the  asymptotic  distribution  of  (a,b)  is  completely  spe¬ 
cified  if  (a,b)  are  known.  However,  in  practice,  (a,b) 
are  not  known  Therefore,  we  use  their  estimates,  (a,b) , 
in  Equations  (2.49),  (2.54),  (2.55),  and  (2.56)  to  get 

estimates  of  the  parameters  of  the  asymptotic  bivariate 
normal  distribution. 


'•1 
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r 


E 


Now,  the  correlation  coefficient  between  a  and  b 
is  estimated  as 


a,b 


Cov (a,b) 


S' 


(2.57) 


Var  (a) ,  Var  (b) 


where  Var(a),  Var(b),  Cov(a,b)  are  obtained  from  Equa¬ 
tions  (2.50)  to  (2.53). 

Finally,  to  obtain  the  100(l-a)?5  confidence  regions 
for  a  and  b,  we  use  the  following  approximation  ([ROU73]) 


fcnL(a,b|£,t)  -  ilnL  (a  ,b  j  y_,  t)  =  jxl;a 


or 


£nL(a,b|y_,  t)  =  JlnL(a,b  jy  ,t)  -  jX2;ct  (2.58) 

where  ilnL  (a,b  |  y_,  t)  represents  the  value  of  the  log- 
likelihood  function  at  a  =  a  and  b  =  b. 

Substituting  Equation  (2.45)  in  Equation  (2.58), 
we  get 
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L 


1  (y  -yi_1)*na  +  z  (y.-y.  ,Hn(e  -e  ) 

i=l  -  11  i=l  1  1-1 

n  -bt 

-  Z  *.n{  (y.-y.  ,)!}  -  a(l-e  n)  =  C,  (2.59) 
i=l  1  1-1 


where 


C  =  ?.nL  (a , b  j  t )  -  Jx 2.a 


(2.60) 


Equation  (2.59)  defines  a  contour  of  the  100(l-a)% 
confidence  region.  For  given  data,  a,  b,  and  a.  Equa¬ 
tion  (2.59)  can  be  solved  for  those  values  of  a  and  b 
which  satisfy  it.  (For  computational  purposes,  it  is 
easier  to  take  values  of  a  ( >^  a)  and  solve  for  the  cor¬ 
responding  values  of  b.) 


2.5.2  Estimation  When  Tines  Between  Failures 


Are  Given 


Now  we  consider  the  case  when  data  is  available  in 
the  form  of  times  between  individual  failures.  As  men¬ 
tioned  earlier,  such  data  is  not  common  and  is  rarely 
available . 

Recall  that  X^,X2/...»Xn  denote  the  times  between 
n 

failures  and  S  -•  Z  X.  .  Then  the  data  is  in  the  form 
n  i=i  1 
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n 


x  =  (x, ,x~ , . . . ,x  )  and  s  =  Ex..  The  distribution  of 

”  x  4  li  ii  i““l  * 


times  between  failures  was  discussed  in  section  2.4.3 
and  is  obtained  from  Equations  (2.40)  and  (2.41),  as 


n 


-bs 


o  (S.  ,  .  .  . ,S  ) 
/  x  n 

1'  n 


-b  E  s.  ___ 

,  .  >n  i=l  -a  ( 1-e  n) 

(ab)  e  •  e 


The  likelihood  function  for  a,  b,  given  s,  is  the  same 
as  above  and  can  be  written  as 


n 


-bs 


L  (a,b  I  s)  = 


-b  E  s.  __ 

,  .  .  n  i=l  -a  ( 1-e  n)  c ,  . 

(ab)  •  e  •  e  .  (2.G1) 


Then  the  log  (natural)  likelihood  is 


n 


-bs 


HnL(a,b|s)  =  n£na+n2.nb-b  E  s.-a(l-e  n) 

i=l  1 


(2.62) 


To  get  the  maximum  likelihood  estimates  a,  b,  we  take 
the  partial  derivatives  of  Equation  (2.62)  and  equate 
them  to  zero,  i.e.. 


3  £nL 


3a 


=  0  , 


(2.63) 
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and 


3«,nL 

3b 


(2  .  C 4  ) 


These  equations  yield 


n 

a 


1 


e 


-bs 

n 


(2.65) 


and 


-bs  n 

£  =  as  •  e  n  +  l  s.  .  (2.66) 

d  n  i-l  1 

As  in  the  first  case,  these  equations  do  not  yield 
simple  analytical  solutions  and  have  to  be  solved 
numerically.  The  solutions  of  Equations  (2.65),  and 
(2.66)  are  the  mle's  a  and  b. 

Regarding  the  asymptotic  distribution  of  (a,b),  re¬ 
call  that  (see  section  2.4.2)  the  joint  density  of  S^, 

. . . , Sn  is  improper.  Therefore,  the  asymptotic  proper¬ 
ties  of  mle's  do  not  hold  in  this  case. 

To  obtain  the  100(l-a)%  confidence  regions  for 
(a,b) ,  we  use  the  same  approximation  as  was  used  in 
section  2.5.1,  viz. 
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(2.67 


AnL(a,b|s)  -  AnL(a,b|s)  =  jX2;ot  * 

From  Equations  (2.62)  and  (2.67),  a  contour  of  the 
100(l-a)55  confidence  region  is  obtained  as 

n  -bs 

nJlna  +  n«,nb  -  b  l  s.  -  a(l-e  n)  =  C,  (2.68 
i=l  x 


where 


C  =  JlnL  (a,b  |  s)  - 


1  2 
Ix2;a 


(2.69 


As  before.  Equation  (2.68)  can  be  solved  for  given 


s,  a,  b,  and  o  to  get  the  desired  contours 


2.6  GOODNESS-OF-FIT  TEST 


In  this  section,  we  describe  the  Kolmogorov- Smirnov 
goodness-of-fit  test  (K-S  Test)  to  check  whether  the 
NHPP  model  developed  in  sections  2.2  and  2.5  provides  a 
good  fit  to  a  given  set  of  failure  data. 

Consider  the  case  when  the  data  are  given  as  a  se¬ 
quence  of  software  failure  times  s  =  (s^,s2, . . . ,sn) .  We 
want  to  test  whether  the  events  s  are  generated  from  a 
NHPP.  Suppose  that  0  <_  S.^  <_  S2  <_  ...  <_  S  are  the  random 
times  at  which  the  first  n  events  occur  in  a  NHPP  with  un 
known  mean  value  function  m(t) .  We  wish  to  test  the 
simple  hypothesis 


Hq  :  m(t)  =  mQ  (t)  for  t  >_  0, 


versus 


:  m(t)  ^  mQ  (t)  for  t  >_  0 . 


Writing  mQ (t)  =  ag(l-e 


), 


the  hypothesis  HQ  can  be 


written  as 


Hq :  m(t)  =  a0(l-e  0  )  for  t  >  0.  (2.70) 

I  . 

For  testing  purposes ,  we  need  the  joint  conditional  dis¬ 
tribution  of  the  failure  times.  The  following  theorem 
is  useful  in  deriving  this  distribution. 

Theorem.  Given  that  N(t)  =  n,  the  n  failure  times 
0  <_  S1  <_  S2  ±  •  •  •  ±  Sn  in  the  interval  [0,t]  are  random 
variables  whose  joint  conditional  distribution  is  the 
same  as  the  distribution  of  the  order  statistics  of  a 
random  sample  of  size  n  from  the  distribution  G(x)  = 

m(t) 

for  0  x  £  t. 

For  proof  of  this  Theorem,  see  Cox  and  Lewis  [COX 66]  . 

Corollary.  Given  that  Sn  =  t,  the  (n-1)  failure  times 
0  i  £  Sj  <  . . .  <_  Sn_^  have  the  same  joint  conditional 
distribution  as  the  order  statistics  of  a  random  sample 
of  size  (n-1)  from  the  distribution  G(x)  =  . 

This  Corollary  easily  follows  from  the  above  Theorem. 
Using  this  Corollary,  we  reduce  the  hypothesis  of  Equation 
(2.70)  to 

mn  (x) 

Hq :  G (x)  -  Gq (x)  *  f°r  0  -  x  -  t‘  (2.71) 
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For  our  case  we  have 


Hq  :  G (x)  = 


-fa0X 


for  0  <  x  <  t. 


(2.72) 


Note  that  the  expression  in  Equation  (2.72)  represents  a 
truncated  exponential  distribution. 

We  now  consider  the  Kolmogorov- Smirnov  (X-S)  good- 
ness-of-fit  test  [ROM76,  ROV73].  Given  the  values  of 
a  random  sample  of  size  n-1,  s^,s2, . . . ,sn-1,  we  define 
the  sample  cdf  by  Hn-1(x)  =  k/(n-l),  where  k  is  the 
number  of  sample  values  £  x.  Thus,  H^fx)  is  a  step 
function  which  is  zero  for  x  less  than  s^,  has  a  jump 
of  l/(n-l)  at  each  s^,  and  is  1  for  x  greater  than  or 
equal  to  sn_^*  That  is. 


,  x  <  S. 


(x)  s  /  Jc/  (n- 1 )  f  8j^  i f  k=*2  f  3  $  •  •  •  *  n~l  •  (2*73) 


'  X  I  S„  1 
n-i 


Since  Hn_^  is  a  step  function  and  G  is  monotonically  in¬ 
creasing  and  continuous,  it  suffices  to  test  the  absolute 


deviations  at  the  sample  points  sk,  k  «  l,2,...,n-l, 
and  then  take  the  maximum  of  these  (n-1)  values.  The 
following  procedure  is  used  for  calculating  the  test 
statistic  D.  For  each  k  =  1,2,..., n-1,  set 

Dk  =  max{|G0(sk)  -  j£j-|  ,  |GQ(sk)  -  |rrl}* 

Then  set 

D  =  raax{D.  }  .  (2.74) 

k  K 

If  the  value  of  D  calculated  in  Equation  (2.74)  is  great¬ 
er  than  or  equal  to  the  critical  value  we  reject 

the  null  hypothesis  HQ  that  S-^Sj, . . .  ,Sn_^  follow  GQ  (x) ; 
otherwise  we  do  not  reject  the  null  hypothesis.  The 

critical  values  D  .  associated  with  the  K-S  test  at 

n— a;  a 

a  level  of  significance  a  are  available  from  statistical 
tables  [ROH76,  p.  661]. 

It  should  be  noted  that,  if  the  parameters  of  GQ (x) 
are  estimated  from  the  sample,  the  K-S  test  can  be  used 
but  will  give  extremely  conservative  results.  To  achieve 
better  results,  the  level  of  significance  needs  to  be 
adjusted.  One  approach  suggested  by  Allen  [ALL 78]  is 


to  test  at  the  555  level  of  significance  and  use  the 
critical  value  for  the  2055  level  or  test  at  the  IX 
level  and  use  the  critical  value  for  10 %  level.  We 


will  use  this  approach  in  our  analyses  in  later  sec¬ 
tions  . 

Another  use  of  the  K-S  test  in  our  context  is  in 
developing  confidence  limits  for  the  true  cdf  G(x). 
For  example,  if  we  take  a  random  sample  of  size  (n-1) 
and  use  it  to  construct  the  sample  cdf  H  ^ (x) •  then 
we  can  be  100(l-a)J5  confident  that  the  true  cdf  G(x) 
does  not  deviate  from  H  ..  (x)  by  more  than  D  ,  . 

ri””  _l  j.  /  oi 

That  is,  the  100(l-a)55  confidence  limits  for  G(x)  are 
given  by 


H  (x)  -  D  ,  < 

n-1  n-l; a 


G(x) 


Hn-l(x) 


+  D 


n-1;  a 


(2.75) 


These  limits  are  especially  useful  in  the  case  when  the 
parameters  of  Gq(x)  are  to  be  estimated  from  the  data. 
For  this  case,  the  null  hypothesis  HQ  will  be  rejected 
at  a  level  of  significance  a  if  one  or  more  points  of 
Gq  (x)  fall  outside  the  100(l-a))5  confidence  limits  given 
by  Equation  (2.75).  Otherwise,  it  will  not  be  rejected. 


2.7  ANALYSIS  OF  FAILURE  DATA  FROM  NAVAL  TACTICAL 

DATA  SYSTEM  (NTDS) 

Jelinski  and  Moranda  [JEL  72]  first  analyzed  some 
software  failure  data  from  the  U.S.  Navy  Fleet  Computer 
Programming  Center.  Since  then,  this  data  set  has  been 
used  by  several  investigators  for  model  validation  pur¬ 
poses.  In  this  section,  we  analyze  the  same  data  set 
to  see  how  good  the  NHPP  model  is  in  modelling  these 
failures. 

The  data  set  was  extracted  from  information  about 
errors  in  the  development  of  software  for  the  real-time, 
multi-computer  complex  which  forms  the  core  of  the 
Naval  Tactical  Data  System  (NTDS) .  The  NTDS  software 
consisted  of  some  38  different  project  schedules.  Each 
module  was  supposed  to  follow  three  stages:  the  produc¬ 
tion  (or  development)  phase,  the  test  phase,  and  the 
user  phase.  Many  of  the  "trouble  reports"  or  "software 
anomaly  reports"  were  generated  whenever  a  system-level 
symptom  of  a  deficiency  was  noted  by  operators  or  users. 
A  proper  trace  back  to  the  exact  cause  in  software  of 
this  symptom  was  done  by  personnel  familiar  with  the 
entire  system.  However,  Jelinski  and  Moranda  felt  that 
it  was  better  to  analyze  the  data  from  isolated  modules 
than  from  the  total  system,  due  to  the  fact  that  many 


of  the  modules  did  not  evolve  in  the  fashion  indicated. 
One  of  the  larger  modules,  denoted  by  A-module,  had  the 
desired  pattern.  The  times  (in  days)  between  failures 
for  this  module  are  shown  in  Table  2.1.  Twenty-six 
software  faults  were  found  during  the  production  phase 
and  five  additional  faultr  during  the  test  phase.  The 
last  fault  was  found  on  4  Jan  1971.  One  fault  was  ob¬ 
served  during  the  user  phase  on  20  Sept  1971  and  two 
more  faults  (4  Oct  1971,  10  Nov  1971)  during  the  test 
phase.  This  indicates  that  a  re-work  of  the  module 
had  taken  place  after  the  user  error  was  found.  A  more 
detailed  description  of  the  NTDS  software  can  be  found 
in  [ JEL72 ] . 

Data  Analyses 

The  data  in  this  case  is  available  as  times  between 

software  failures  and  hence  the  method  described  in 

section  2.5.2  will  be  used  for  estimation  of  parameters. 

We  consider  the  first  26  data  points  of  Table  2.1,  for 

26 

which  n  =  26  and  s,fi  =  E  x.  =  250  days. 

k=l  K 

To  get  an  appreciation  of  the  likelihood  function 


associated  with  this  data  set,  the  log-likelihood  from 
Equation  (2.62)  is  plotted  in  Figure  2.2.  We  see  that 


TABLE  2.1 


SOFTWARE  FAILURE  DATA  FROM  NTDS 


ERROR  MO. 

TIME  BETWEEN  FAILURES 
xk.  days 

CUMULATIVE  TIME 
*n  “E  xk,  days 

Production 

(Checkout)  Phase 

1 

9 

9 

2  '  i 

12 

21 

3 

11 

32 

4 

4 

36 

5 

7 

43 

6 

2 

45 

7 

5 

50 

8 

8 

58 

9 

5 

63 

10 

7 

70 

11 

1 

71 

12 

6 

77 

13 

1 

78 

14 

9 

87 

15 

4 

91 

16 

1 

92 

17 

3 

95 

18 

3 

98 

19 

6 

104 

20 

1 

105 

21 

11 

116 

22 

33 

149 

23 

7 

156 

24 

91 

247 

25 

2 

249 

26 

1 

250 

Test  Phase 

27 

87 

337 

28 

47 

384 

29 

12 

396 

30 

9 

405 

31 

135 

540 

User  Phase 

32 

258 

798 

Test  Phase 

33 

16 

814 

34 

35 

849 
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the  surface  rises  sharply  along  the  b-axis  and  is  rela¬ 
tively  flat  along  the  a-axis. 


The  maximum  of  this  surface  is  obtained  by  solving 
Equations  (2.65)  and  (2.66).  Substituting  the  appropri¬ 
ate  values  from  Table  2.6  in  Equations  (2.65)  and  (2.66) 
we  get 


26.  =  !  _  e“b(250) 


(2.76) 


and 


a (250 )  •  e"b(250)  +  250.  (2.77) 

Solving  Equations  (2.76)  and  (2.77)  numerically,  we  get 

a  =  33.99 


and 


b  =  0.00579 

as  the  mle's  for  a  and  b,  respectively.  The  fitted  mean 
value  function  is 


m(t)  =  33.99(1  -  e_0-00579t)  . 


(2.78 


and  is  shown  in  Figure  2.3,  along  with  the  actual  data 
(determination  of  the  confidence  bounds  will  be  discuss¬ 
ed  later)  . 


Goodness-of-f it  Test 


We  now  perform  the  Kolmogorov- Smirnov  goodness-of- 
fit  test  to  check  the  adequacy  of  the  fitted  model.  No) 
using  the  Corollary  and  the  results  in  Section  2.6,  we 
conduct  the  test  based  on  26-1  =  25  points.  The  hypo¬ 
thesis,  from  Equation  (2.71),  is 


V  G0(x)  = 


l-e 


-bQx 


l-e 


-bQ  (250) 


for  0<x<250,  (2.79 


and  the  sample  cdf  is 


,  x  <  s. 


H  (x) 


k/25  .  s,  .  <x<s,  ,  k=2,3 


25  . 


(2.80 


Figure  2,3.  Plots  of  Mean  Value  Function  and  90V Confidence 
Bounds  for  the  N(t)  Process  (NTDS  Data) 


The  values  of  sk  and  H(sk)  are  given  in  Table  2.2.  To 
compute  Gg(sk)  for  various  sk  values,  we  oplace  bg  by 
b  in  Equation  (2.79)  and  obtain  Column  4  of  Table  2.2. 
Entries  in  Columns  5  and  6  are  easily  obtained  from 
Columns  3  and  4.  Now,  from  Equations  (2.47)  and  (2.79) 

D  =  raaxt  )G0  (s.)  -H(sk)|,  |GQ(sk)  -H(sk_1)|}. 
k 

In  other  words,  D  is  the  largest  entry  in  Columns  5  and 
6  and  is  seen  to  be 


D  =  0.2044. 


To  test  at  a  =  .05,  we  use  a  critical  value  correspond¬ 
ing  to  a  =  .20  as  discussed  in  section  2.6. 

From  statistical  tables. 


°25 ; 0 . 2 


0.208  . 


Since  D  <  D25.q  2,  we  accept  the  null  hypothesis,  HQ, 
at  55?  level  of  significance. 


The  100(l-a)J?  confidence  limits  for  G(x)  can  now 
be  calculated  from  Equation  (2.75).  For  example,  for 


0.64 

0.68 

0.72 

0.76 

0.80 

0.84 

0.88 

0.92 

0.96 

1.00 


0.0664 
0.1497 
0.2211 
0.2460 
0.2882 
0.2999 
0.3286 
0.3730 
0.3996 
0.4357 
0.4407 
j  0.4703 
|  0.4751 
j  0.5174 
0.5355 
0.5399 
0.5532 
0.5661 
0.5915 
0.5956 
0.6395 
0.7557 
0.7776 
0.9946 
0.9982 


0.0264 

0.0697 

0.1011 

0.0860 

0.0882 

0.0599 

0.0486 

0.0530 

0.0396 

0.0357 

0.0007 

0.0097 

0.0449 

0.0426 

0.0645 

0.1001 

0.1268 

0.1539 

0.1685 

0.2044 

0.2005 

0.1243 

0.1424 

0.0346 

0.0018 


0.0664 

0.1097 

0.1411 

0.1260 

0.1282 

0.0999 

0.0886 

0.0930 

0.0796 

0.0757 

0.0407 

0.0303 

0.0049 

0.0026 

0.0245 

0.0601 

0.0868 

0.1139 

0.1285 

0.1644 

0.1605 

0.0843 

0.1024 

0.0746 

0.0382 


a  =  0.05,  we  have  q5  =  0.264,  so  that  the  lower 

and  upper  confidence  bounds  are 

L(x)  =  max{H(x)  -  0.264,0} 

and 

U (x)  =  min{H (x)  +  0.264,1}, 

where  H(x)  is  given  by  Equation  (2.80).  The  9 5 55  bounds 
for  G(x),  along  with  Gq (x) ,  are  shown  in  Figure  2.4. 

We  see  that  the  fitted  model  seems  to  be  adequate. 

Having  established  that  the  model  provides  a  good 
fit,  various  performance  measures  of  interest  can  be 
obtained  by  substituting  the  estimated  values  of  a  and 
b  in  the  appropriate  equations  of  sections  2.3  and  2.4. 

The  estimated  mean  value  function,  as  given  in  Eaua 
tion  (2.78),  is  m(t)  =  33. 99  (l-e"° * 00579t) .  A  plot  of 
m(t)  and  the  actual  number  of  faults  detected  during  the 
production  period  for  this  case  was  given  in  Figure  2.3. 
Also  shown  were  the  9055  confidence  bounds  for  the  N(t) 
process  as  computed  from  Equation  (2.15). 


UPPER  BOUI 


The  100(l-a)£  confidence  regions  for  a  and  b  are 
obtained  from  Equations  (2.68)  and  (2.69)  following  a 
procedure  similar  to  the  one  detailed  in  section  2.7. 
These  are  shown  in  Figure  2.5  for  a.  =  0.05,  0.25,  and 
0.50. 

Finally,  software  reliability,  Rv  ic  (x|250),  can 

x27 ' fa26 

be  computed  from  Equation  (2.36).  For  example,  the  re¬ 
liability  values  after  x  =  5,  10,  20,  and  30  days  are 
0.796,  0.638,  0.417,  and  0.280,  respectively.  Thus,  the 
probability  that  the  system  will  operate  without  any 
failures  for  30  additional  days  is  0.28.  As  seen  from 
the  data  in  Table  2.1,  the  system  did  operate  without 
any  failures  for  87  days  subsequent  to  failure  number 


2.8  ANALYSIS  OF  FAILURE  DATA  FROM  A  LARGE  SCALE 

SOFTWARE  SYSTEM 

The  data  to  be  analyzed  in  this  section  have  been 
taken  from  a  large  scale  project  reported  in  Thayer  et 
al.  [THA76].  This  project  represents  an  initial  de¬ 
livery  of  a  large  command  and  control  software  package 
written  in  J0VIAL/J4  (JOVIAL  is  a  higher  order  language 
generally  used  for  Air  Force  Command  and  Control  appli¬ 
cations)  .  It  consists  of  115,346  total  source  state¬ 
ments  and  249  routines.  Some  other  characteristics  of 
this  project  are  summarized  in  Table  2.3.  The  software 
was  developed  functionally,  i.e.,  the  project  was  di¬ 
vided  into  work  units  responsible  for  different  func¬ 
tions.  Software  testing  started  with  developing  test¬ 
ing  by  the  development  personnel  to  demonstrate  specific 
functional  capabilities,  test  data  extremes,  etc. 

2.8.1  Failure  Data 

The  failure  data  used  for  this  study  is  taken  from 
the  Software  Problem  Reports  (SPR's)  generated  during 
the  formal  testing  phases  of  this  project.  Formal  test¬ 
ing,  which  comprises  of  validation  and  acceptance  testing 
began  after  development  testing.  Validation  testing  was 


Size  (Total  source  statement) 

115,346 

Number  of  routines 

249 

Language 

JOVIAL/  J4 

Formal  Requirements 

To  function  level 

Co-contractor 

Yes 

Subcontractor 

No 

Operating  Mode 

Batch 

Formal  Testing 

24  weeks 

Validation 

10 

Acceptance 

2 

Integration 

10 

Operational  Demonstration 

2 

performed  by  an  independent  test  group  at  the  subsystem 
level  and  demonstrated  the  approved  software  performance 
and  requirements.  Acceptance  testing  ran  a  subset  of 
the  Validation  tests  to  demonstrate  specific  requirements 
After  Acceptance  testing,  the  software  underwent  final 
Integration  testing  by  an  independent  group.  Integration 
testing  demonstrated  that  the  applications  software  cor¬ 
rectly  interfaced  with  the  operating  system  and  system 
support  software.  Finally,  Operational  Demonstration 
testing  was  done  to  demonstrate  the  software  in  an  oper¬ 
ational  environment  using  an  operational  timeline  and 
operational  data.  The  data  for  this  error  data  set  was 
obtained  from  the  four  formal  test  phases  (Validation, 
Acceptance,  Integration,  and  Operational  Demonstration) 
of  the  applications  software.  This  is  so  because  the 
majority  of  the  errors  analyzed  were  detected  during 
formal  testing. 

The  time  period  for  the  various  phases  of  testing 
is  validation  (Jun  1-Aug  12) ,  Acceptance  (Aug  13-Aug  24) , 
Integration  (Aug  25-Oct  26),  and  Operational  Demonstra¬ 
tion  (Oct  27-Nov  12)  testing.  In  addition  to  the  above 
data,  operational  data  spanning  a  period  of  approximate¬ 
ly  nine  months  was  also  available  and  is  used  for  com¬ 
parison  with  the  predicted  values.  The  only  time  frame 


readily  available  from  the  data  was  the  calendar  day. 
The  data  also  contain  the  mistakes  by  the  operators  and 
the  "explanatory"  errors,  i.e.,  corrections  to  make  a 
change  to  a  comment  statement  or  those  errors  for  which 
a  "fix"  is  not  to  a  routine.  These  explanatory  errors 
do  or  do  not  indicate  the  type  of  change.  Therefore, 
the  original  data  was  restructured  into  four  sets  of 
data  denoted  by  DS1,  DS2,  DS3,  and  DS4  [SUK76 ] .  The 
description  and  the  total  number  of  faults  detected  dur 
ing  the  formal  testing  phases  for  each  data  set  are 
given  in  Table  2.4. 

In  this  analysis,  the  number  of  software  faults 
detected  during  formal  testing  is  counted  on  a  weekly 
basis.  Also,  for  each  data  set,  the  software  faults 
detected  during  the  first  nine  weeks  are  eliminated 
due  to  the  fact  that  we  are  interested  in  analyzing 
the  software  failures  over  the  period  when  they  are 
decreasing.  The  number  of  SPR's  for  the  15-week  period 
for  the  four  cases  (DS3.  to  DS4)  are  given  in  Table  2.5. 

2.8.2  Estimation  of  Parameters 

As  seen  in  Table  2.5,  the  data  for  this  project 
are  in  the  form  (t. ,y. ) ,  (t, ,y_) , . . . ,  (t. _ ,y. ^ ) ,  i.e., 
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1138  27  1483  70  2707  54  2362 


as  the  number  of  failures  in  specified  time  intervals. 
Hence,  the  estimates  a  and  b  are  obtained  by  simultane¬ 
ously  solving  Equations  (2.47)  and  (2.48).  Thus,  by 
substituting  the  data  set  DS1  in  Equations  (2.47)  and 
(2.48)  and  solving,  we  get 

a  -  1348,  b  =  0.124  , 

and  the  fitted  mean  value  function  is 

*  -0  1  24t- 

m (t)  =  1348  (1  -  e  u*J“^t),  t  >  0  . 

This  is  also  an  estimate  of  the  expected  number  of  soft¬ 
ware  failures  observed  by  time  t.  A  plot  of  the  actual 
cumulative  number  of  failures  and  the  fitted  values  is 
given  in  Figure  2.6. 

2.8.3  Goodness-of-f it  Test 

The  goodness-of-f it  test  is  now  conducted  following 
the  procedure  discussed  in  section  2.6.  Since  the  sample 
size  is  15,  the  null  hypothesis  to  be  tested  can  be 
written  as 
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Figure  2. 
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Actual  and  Expected  Cumulative  Number 
of  Failures  and  90%  confidence  bounds 
for  the  N ( t)  process  for  data  set  DSl 


a0=  G0(ti)  = 


and  the  sample  cdf  as 


for  i=l, 2, . . . , 15,  (2.81) 


,  x  <  t. 


H  (x)  —  /  i  ^  i—2/  3  /  •  •  •  / 15 •  (2.82) 


.  X  >  ti5 


The  computed  values  of  H(x)  for  various  t.^  are  given  in 
column  2  of  Table  2.6. 

Now  we  substitute  bQ  =  b  =  0.124  in  Equation  (2.81) 

and  compute  the  value  of  Gg(t^)  for  i  =  1,2,..., 15. 

These  values  are  given  in  column  3  of  Table  2.6.  Columns 

4  and  5  of  this  table  are  the  quantities  needed  to  find 

D  =  max(D,  }  (see  Equation  (2.74)).  From  these  columns 
k 

we  find  the  value  of  D  to  be  0.096  corresponding  to  t^  = 
To  find  the  critical  value  corresponding  to  sample 
size  15  and  a  =  .05,  we  first  note  that  the  parameters 
had  to  be  estimated  in  this  case.  As  mentioned  in  sec¬ 
tion  2.6,  for  a  situation  like  this,  a  suggested  ap¬ 


proach  is  to  take  a  =  .20  to  get  good  results.  From 
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TABLE  2 . 6 


DATA  FOR  KOLMOGOROV- SMIRNOV  TEST 
(DATA  SET  DSl) 


a 

Hft^) 

W 

lG0(ti)-H(t.) 1 

lG0(ti)-H(t._1)  1 

a 

0.1784 

0.1381 

0.0403 

0.1381 

B 

0.2979 

0.2601 

0.0378 

0.0817 

B 

0.4587 

0.3679 

0.0908 

0.0700 

B 

0.5000 

0.4631 

0.0369 

0.0044 

5 

0.5404 

0.5472 

0.0068 

0.0472 

6 

0.6028 

0.6215 

0.0187 

0.0811 

□ 

0.6503 

0.6872 

0.0369 

0.0844 

B 

0.7004 

0.7452 

0.0448 

0.0949 

B 

0.7707 

0.7964 

0.0257 

0.096 

10 

0.8269 

0.8416 

0.0147 

0.0709 

11 

0.8506 

0.8816 

0.031 

0.0547 

12 

0.8875 

0.9169 

0.0294 

0.0663 

13 

0.9359 

0.9481 

0.0122 

0.0606 

14 

0.9903 

0.9757 

0.0146 

0.0398 

15 

1  .0000 

1  .0000 

0.0000 

0.0097 

0.266 


! 

the  statistical  tables  [ROH76,  p.  661],  D 
The  observed  value  D  =  0.096  is  less  than  the  critical 
|  value  0.266  and  hence  we  accept  the  null  hypotheses  of 

Equation  (2.76).  Thus  we  conclude  that  at  55?  level  of 
significance  the  model 


mzion  o'0-l2ittn  -0 . 124t. 

P{N  (t)  =y }  -  - )_)_{e-lW8  U-e  ) 

y  • 


} 


can  be  considered  to  provide  an  adequate  fit  to  data 
set  DS1. 

To  further  check  the  adequacy  of  fit,  we  compute 
955?  confidence  bounds  on  G(t^).  From  Equation  (2.75), 
these  bounds  are  given  by 


H(t±)  - 


15; .05 


<  G{tL) 


H(ti) 


+  D 


15; .05 


From  the  statistical  tables,  D.  c  n[-  =  0.366  and  hence 

15; .05 

the  95)5  confidence  bounds  are  given  by  H(t^)  ±  0.366. 

A  plot  of  these  bounds  and  the  fitted  values  are  shown 
in  Figure  2.7. 
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Figure  2, 
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7 ,  95%  confidence  bounds  for  the  conditional 
c.d.f.  G(t.)  and  the  fitted  curve  for 
DSl  data 


To  get  an  appreciation  of  the  variability  in  the 
estimated  values  of  a  and  b,  we  now  construct  confi¬ 
dence  regions  for  (a,b).  Such  regions  are  given  by 
Equations  (2.59)  and  (2.60).  For  a  =  .05,  the  95% 
joint  confidence  region  will  be  the  solution  of  the 
following  equation: 

£nL(a,b|^,t)  =  £nL(a,bj^,t)  -  jX\.  >05  / 


where 


.  .  15  15 

£nL (a,b | £, t)  =  Z  (y. -yt_, ) *n (1348 )  +  Z  (y.-y;  n) 
i=l  i=l  1  11 


- . 124t .  .  - . 124t . 

£n(e  -e  1 


15 


)  -  z  £n{  (y.-y.  . ) 
i=l  1  1"i 


- . 124t 


-  1348 (1-e 


15 


)  . 


Data  (Yi^tj^) ,  (y2,t2)  , . . . ,  (y15,t15)  were  given  in  Table 
2 . 5  and 


2 

Xo.  AC 


0.103  . 


A  plot  of  this  region  is  shown  in  Figure  2.8.  From  this 
plot  we  see  that,  even  though  the  most  likely  values  of 
a  and  b,  based  on  the  data,  are  a  =  1348,  b  =  0.124,  the 
true  values  can  vary  over  the  entire  region  contained 
in  the  95 %  contour.  Values  a  =  1450,  b  =  0.11  will  be 
acceptable  (with  95%  confidence)  and  so  will  a  =  1250, 
b  =  0.14.  50%  and  75%  confidence  regions  are  also  shown 

in  Figure  2.8  and  can  be  similarly  interpreted. 

2.8.5  Variance-Covariance  Matrix  for  (a,b) 

The  variance-covariance  matrix  is  useful  in  quanti¬ 
fying  the  variability  in  the  estimated  parameters  and 
is  obtained  from  Equations  (2.50),  (2.54),  (2.55),  and 
(2.56)  by  substituting  a  =  a  =  1348,  b  =  b  =  0.124,  and 
the  actual  data  values  from  Table  2.5.  For  data  set  DS1, 
we  get 


2368 


-0.2071 


cov 


-0.2071 


5.554  *10 


-5 


From  this  we  have 


Standard  Deviation  (a)  =  /Var  (a)  =  48.66 


Standard  Deviation  (b)  =  /Var  (b)  =  0.00745 


Correlation  Coefficient  (a,b)  =  p; 


- Zg^jgZj: - =,  -0.571  . 

(2368)  (5.554  x  lO-5) 


2.8.6  Number  of  Remaining  Errors 


One  useful  quantity  is  the  estimated  number  of  re¬ 
maining  faults  or  errors  in  the  system  after  some  time 
t.  This  value  is  obtained  from  Equation  (2.19)  as 


E{N (t) }  =  a*e 


-0 . 124t 


E{N (t) }  =  1348e 


A  plot  of  this  quantity  is  shown  in  Figure  2.9. 

As  expected,  this  value  decreases  with  time.  Also  shown 
is  a  plot  of  the  "actual”  number  of  remaining  errors 
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which  is  based  on  the  assumption  that  all  the  errors 
were  detected  during  36  weeks  of  operation.  It  should 
be  noted  that  this  assumption  is  made  for  illustration 
purposes  only  and,  in  general,  this  may  not  be  the 


case . 


It  would  also  be  interesting  to  compute  confidence 
bounds  on  EN(t)  .  Such  bounds  can  be  easily  computed  as 
follows. 

Let  f(a,b)  denote  EN ( t ) .  Then,  it  is  well  known 
[ROH76,  ROU 73]  that  100(l-a)S  confidence  bounds  for 
f(a,b)  are  given  by 


{f  (a,b)  ±  tn_2;a/V(f (a,b)) }  , 


(2.83 


where 


V (f (a,b) )  = 


(ii  if )z  /il 

l3a  3b;  covl  3a 


(2.84 


a=a,  b=b 


and  t  ,,  is  the  upper  100 (a/2)  percentage  point  of 

n™  z  f  oi/  z 

the  t-distribution  with  (n-2)  degrees  of  freedom. 


To  see  how  reliability  varies  with  time,  a  plot  of 
R(x|s=15)  is  shown  in  Figure  2.10. 

To  obtain  confidence  bounds  on  reliability,  we  use 
a  procedure  similar  to  the  one  used  for  getting  bounds 
on  E{N(t)>.  Let  g(a,b)  represent  R(x|s=15).  Then  the 
confidence  bounds  are  given  by 

(g(a,b)  ±  tn.2;a/2/v(g(a,b)  )  }  ,  (2.85) 

where 

(2.86) 

A 

a=a 
b=b 


V (g (a,b) )  = 


(la  ia}E 

v3a  3b;  cov 


90  ft  confidence  bounds  computed  from  these  equations  for 
the  given  data  are  shown  in  Figure  2.10. 

Analyses  similar  to  those  for  data  set  DS1  were 
undertaken  for  data  sets  DS2,  DS3,  and  DS4  of  Table  2.5. 
A  summary  of  the  results  is  given  in  Table  2.7. 
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o  =1348 
f>  =  0.124 


90%  UPPER  BOUND 


£ 


FITTED 


90%  LOWER  BOUND 


TABLE  2.7 

A  SUMMARY  OF  DATA  ANALYSES 


— - — _  Data  Set 

Quantity  ' 

DS1 

DS2 

DS3 

DS4 

a 

1348 

1823 

3958 

3446 

a 

b 

0.124 

0.112 

0.0768 

0.0771 

,/var(a) 

48.7 

62.2 

147.3 

136.6 

J  Var(b) 

0.00745 

0.00643 

0.00460 

0.00492 

P_  . 
a,b 

-0.571 

-0.648 

-0.856 

-0.855 

Estimated  Number  of  Remain¬ 
ing  Errors  at  the  end  of 

Operational  Demonstration 

209 

338 

1212 

1050 

Number  of  Errors  Detected 
During  Nine  Months  of 
Operation 


198 


263 


540 


475 


2.9  ANALYSIS  OF  FAILURE  DATA  FROM  COMMAND  AND 

CONTROL  SYSTEMS 

In  this  section,  we  analyze  software  failure  data 
from  two  real-time  command  and  control  systems,  SYS1  and 
SYS2.  These  data  sets  were  reported  in  [MUS30]  and 
represent  failures  observed  during  the  system  test  phase. 
The  number  of  delivered  object  instructions  for  SYS1  was 
21,700  and  for  SYS2 ,  27,700.  The  number  of  programmers 
for  SYS1  and  SYS2  was  9  and  5,  respectively. 

For  the  first  system,  a  total  of  136  failures  were 
observed  over  25  hours  of  execution  time  and  for  the 
second  system,  the  number  of  failures  was  54  over  31  hours 
of  execution  time.  The  observed  number  of  failures  per 
execution  hour  and  the  cumulative  failures  are  given  in 
Table  2.8.  The  number  of  failures  per  hour  are  plotted 
in  Figures  2.11  and  2.12,  respectively.  The  parameters 
a  and  b  were  estimated  using  Equations  (2.65)  and  (2.66) 
of  Section  2.5  and  are 

SYS1  a  =  142.32  b  =  0.125 

SYS2  a  =  56.81  b  =  0.097 


2-78 


TABLE  2.8 


FAILURES  IN  ONE  HOUR  (EXECUTION  TIME)  INTERVALS 
AND  CUMULATIVE  FAILURES 


Hour 

SYS1 

SYS  2 

No. 

Cum. 

No. 

Cum. 

1 

27 

27 

10 

10 

2 

16 

43 

6 

16 

3 

11 

54 

4 

20 

4 

10 

64 

5 

25 

5 

11 

75 

2 

27 

6 

7 

82 

1 

28 

7 

2 

84 

1 

29 

5 

89 

1 

30 

9 

3 

92 

0 

30 

10 

1 

93 

1 

31 

11 

4 

97 

3 

34 

12 

7 

104 

7 

41 

13 

2 

106 

1 

42 

14 

5 

111 

0 

42 

15 

5 

116 

0 

42 

16 

6 

122 

0 

42 

17 

0 

122 

0 

42 

18 

5 

127 

4 

46 

19 

1 

128 

0 

46 

20 

1 

129 

1 

47 

21 

O 

131 

1 

48 

22 

1 

132 

0 

48 

23 

2 

134 

1 

49 

24 

1 

135 

1 

50 

25 

1 

136 

1 

51 

26 

0 

51 

27 

1 

52 

28 

0 

52  i 

29 

1 

53 

30 

0 

53 

31 

1 

54 

0  5  10  15  20  25 

TIME  (HOURS) 

FIG.  2.11  PLOT  OF  THE  NUMBER  OF  FAILURES  PER  HOUR  (SYS  I) 
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The  fitted  models  for  the  mean  value  function  are: 


SYS1  m (t)  =  142.32(1  -  e_0*125t) 

SYS2  m (t)  =  56.81(1  -  e“0,097t) 


Plots  of  the  observed  cumulative  failures  and  expected 
failures  (m(t))  are  shown  in  Figures  2.13  and  2.14  for 
SYS1  and  SYS2,  respectively. 

Observed  number  of  remaining  errors  and  expected 
number  of  remaining  errors  were  computed  from  (a-N(t) ) 

^  — Ui. 

and  a-e  ,  respectively,  and  are  plotted  in  Figures 
2.15  and  2.16  for  SYS1  and  SYS2 ,  respectively.  The  90% 

A  A 

confidence  bounds  for  m(t)  and  E(N(t))  are  also  in 
Figures  2.13  to  2.16.  From  a  study  of  these  plots,  it 
appears  that  the  fitted  models  fit  the  data  very  well. 

Expressions  for  software  reliability  for  the  two 
systems  are  obtained  from  Equation  (2.36)  as 


R (x | s=25 ) 


e-142.32{e"*125  (25) 


e-.125  (25+x) } 


R (x | s=31) 


-56.81 

e 


{e-. 097(31) 


e-.097 (31+x) } 


and 


0. 

0  5  10  15  20 

TIME  IN  HOURS 

FIG.  2.13  NUMBER  OF  FAILURES  AND  90%  CONFIDENCE  BOUNDS  (SYS  I ) 
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FIG.  2.15  OBSERVED  AND  EXPECTED  NO  OF  REMANING  ERRORS 
WITH  90%  CONFIDENCE  BOUNDS  ON  E  [N  ( x  )]  -  SYS  \ 
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FIG.  2.16  OBSERVED  AND  EXPECTED  NUMBER  OF  REMAINING  ERROR  AND 
90%  CONFIDENCE  BOUNDS  ON  E[N(x )J  -SYS 2. 
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2.10  ANALYSES  OF  VARIOUS  TYPFS  OF  ERRORS  FROM  A 

REAL-TIME  CONTROL  SYSTEM 

In  this  section,  we  study  the  failure  data  from  a 
real-time  control  system  for  a  land-based  radar  system 
developed  by  the  Raytheon  Company  [WIL77]  .  It  was  de¬ 
veloped  in  a  modular  fashion  (a  total  of  109  modules) 
and  nearly  all  modules  were  written  in  JOVIAL/J3. 

( JOVIAL/J3  is  the  standard  programming  language  for  Air 
Force  Command  and  Control  Applications.)  The  rest  of 
the  modules,  chiefly  the  Executive  program,  were  written 
in  Assembly  language.  The  whole  system  has  a  total  of 
86,780  lines  and  49,900  Assembly  lines  of  code.  The 
software  system  runs  in  JOVIAL,  Raytheon's  multiproces¬ 
sor  computer  which  consists  of  two  identical  processors 
(one  utilized  as  a  CPU  and  the  other  as  an  I/O  control 
unit),  and  81,920  words  of  24-bit  core  memory.  The 
software  operates  under  the  control  of  a  highly  central¬ 
ized  modular  Executive  program  which  supervises  all  real¬ 
time  activity  on  both  the  CPU  and  IOCU.  The  software 
system  features  a  common  data  base  whose  overall  layout 
is  defined  by  means  of  a  COMPOOL.  During  compile  time, 
the  JOVIAL  compiler  creates  the  necessary  linkages  for 
operational  programs  to  gain  access  to  the  data  base. 
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Testing  of  the  software  system  proceeded  in  three 
phases:  unit  testing  of  individual  program  modules,  in¬ 

cluding  the  Executive  program;  integration  (build)  test¬ 
ing;  and  operational  testing  of  the  system  in  the  field. 
Unit  testing  was  carried  out  on  a  Digital  System  simula¬ 
tor  rather  than  on  the  live  computer  in  order  to  take 
advantage  of  the  simulator's  extensive  debugging  tools. 

On  the  other  hand,  integration  testing,  whose  chief  pur¬ 
pose  was  to  check  out  control  and  data  interfaces  among 
program  modules,  was  done  on  a  real  machine.  Finally, 
operational  testing  was  performed  on  a  series  of  in¬ 
creasingly  demanding  missions  designed  to  exercise  the 
system  and  evaluate  its  response  under  various  loads 
and  physical  environments.  Operational  missions  were 
first  rehearsed  in  conjunction  with  a  mission  simulator, 
then  performed  with  a  full  hardware  complement  under 
actual  field  conditions. 

2.10.1  Error  Data 

Integration  testing  was  responsible  for  the  largest 
number  of  Software  Problem  Reports  (SPR's).  The  SPR 
forms  were  filled  by  anyone  (systems  analyst,  programmer, 
or  user  of  the  software).  SPR's  were  generated  as  soon 
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1 


as  an  error  (problem)  was  identified  and  were  not  de¬ 
layed  until  a  solution  was  devised  and  tested.  The 
error  data  set  used  in  validating  the  NHPP  model  was 
derived  from  the  SPR's  only  during  the  acceptance  and 
operational  testing  over  a  22  month  period  during  1974- 
1976.  The  data  for  the  entire  33  month  period  will  be 
analyzed  in  Section  3. 

The  error  data  was  categorized  according  to  the 
seriousness  of  the  error  as  well  as  according  to  the 
type  of  error  as  follows. 

Seriousness  of  Error 

(1)  Critical  -  if  the  error  is  impeding  the  pro¬ 
ject  development; 

(2)  Low  -  if  it  is  not  really  necessary  for  a 
correction  to  be  made  for  the  current  develop 
ment  to  proceed; 

(3)  Improvement  -  if  it  is  a  suggestion  for  im¬ 
provement  but  not  necessary  for  satisfactory 
operation; 

(4)  Medium  -  of  medium  severity. 

The  number  of  errors  for  this  classification  is 


given  in  Table  2.9. 


A 


Computational 
B  Logic 

D  Data  Handling 

L  User  Requested  Changes 

M  Preset  Data  Base 

P  Recurrent 

E  Others,  such  as  operating  system/ 

support  software  error,  routine/ 
system  interface  errors,  user  in¬ 
terface  errors,  unidentified 
errors,  etc. 


The  total  number  of  errors  for  these  categories  are 
given  in  Table  2.10. 

Using  the  model  and  estimation  technique  of  Sections 
2.2  and  2.5,  respectively,  the  estimated  values  of  a  and 
b  were  obtained  and  are  also  shown  in  Tables  2.9  and  2.10 
for  each  category  of  errors.  Thus,  for  critical  errors 

A  A 

the  estimates  are  a  *  73  and  b  =  0.067  and  the  fitted 
NHPP  is 
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TOTAL  999  1046 


Since  the  observed  number  of  critical  errors  in  22 
months  is  56,  this  model  indicates  that  73-56  =  17  cri¬ 
tical  errors  are  still  remaining  in  the  system. 

Plots  of  the  actual  and  fitted  values  of  the  number 
of  errors  for  each  category  are  given  in  Figures  2.19  to 
2.22,  respectively.  Comparing  the  actual  and  fitted 
curves,  the  NHPP  model  seems  to  provide  a  satisfactory 
description  of  these  errors. 
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Figure  2.19  Actual  software  errors  with  several 

levels  of  seriousness  during  22-month 
period  of  testing 
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Figure  2.22  The  fitted  mean  value  functions  for 
several  types  of  errors 
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2.11  ANALYSIS  OF  FAILURE  DATA  FROM  THE  APOLLO  PROJECT 

Now  we  analyze  the  failure  data  from  an  on-board 
Apollo  space  flight  software  project  developed  by  the 
Charles  Stark  Draper  Laboratory,  Inc.  [RYD77]  during 
the  years  1967  to  1971.  This  software,  with  a  size  of 
83,866  words,  runs  on  the  Apollo  Guidance  Computer  (AGC) 
(designed  by  MIT/IL)  which  was  used  throughout  all  the 
Apollo,  Skylab,  Apollo-Soyuz,  and  F-8  Phase  I  programs. 
The  purpose  of  the  AGC  was  to  compute  guidance,  target¬ 
ing,  navigation,  and  connrol  functions  for  the  Apollo 
space  vehicle  for  all  mission  phases. 

This  software  was  developed  by  a  group  of  guidance, 
navigation,  and  control  engineers,  programmers,  and  test 
engineers.  The  coding  was  done  both  in  the  assembly 
language  of  the  AGC  and  in  the  interpretive  language 
(INTERPRETER)  developed  for  the  project. 

Testing  and  verification  at  the  laboratory  were  per 
formed  using  various  facilities,  including  engineering 
simulation  in  the  host  computer,  full  scale  digital  simu 
lation  on  the  host  computer,  and  a  hybrid  laboratory  and 
system  test  laboratory  that  provided  real-time  execution 
Several  levels  of  testing  were  performed: 


Level  1  tests  were  high  order  language  programs 

run  on  the  host  computer  to  test  algorithms. 

Level  2  was  the  AGC  counterpart  of  these  programs. 

Level  3  was  intended  to  verify  the  operation  of  a 
complete  program  or  routine  including  crew 
interface  and  realistic  physical  environ¬ 
ment  models. 

Level  4  testing  was  intended  to  verify  mission  phases, 
e.g.,  ascent,  redezvous. 

Level  5  repeated  the  level  4  tests  on  the  final  rope 
which  was  released  for  manufacture. 

Level  6  took  place  after  the  ropes  were  released 

for  manufacture  and  were  intended  to  veri¬ 
fy  the  program  using  actual  mission  data 
and  the  flight  time-line. 

The  hybrid  and  system  test  laboratories  were  exten¬ 
sively  used  in  parallel  with  digital  simulation  for 
level  3,  4,  5,  and  6  tests.  Levels  1  and  2  were  perform¬ 
ed  exclusively  on  the  digital  or  engineering  simulators. 

Changes  to  the  software  (as  a  result  of  software 
errors)  were  controlled  by  the  following  documents: 

-  Program  Change  Request  (PCR) 

-  Program  Change  Notice  (PCN) 


-  Anomaly  Report 

-  Assembly  Control  Board  Request 

The  error  data  set  which  was  derived  from  these  documents 
was  categorized  according  to  types  and  is  summarized  in 
Table  2.11. 

The  estimates  of  the  model  parameters  for  each  cate¬ 
gory  and  the  total  were  obtained  by  the  method  of  Section 
2.5  and  are  given  in  Table  2.11.  The  likelihood  surface 
(for  total  errors)  is  shown  in  Figure  2.23  and  a  plot  of 
the  contours  of  this  surface  in  the  (a-b)  plane  is  given 
in  Figure  2.24.  From  these  figures,  we  note  that  the 
surface  is  really  well  behaved. 

Plots  of  the  observed  and  estimated  total  number  of 
failures  over  the  35  month  period  are  shown  in  Figures 
2.25  and  2.26,  respectively.  Again,  a  comparison  of  the 
two  sets  of  figures  indicates  that  the  model  provides  an 
excellent  fit  to  the  data. 
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FIGURE  2.25. 


OF  SOFTWARE  ERRORS  (APOLLO) 


2.12  ANALYSIS  OF  DATA  FROM  A  LARGE  AVIONICS  REAL-TIME 

SYSTEM 

The  software  from  which  this  error  (failure)  data 
is  taken  is  a  large  avionics  real-time  system  for  DOD 
developed  by  the  Boeing  Aerospace  Company  [FRI77].  It 
consists  of  40,640  lines  of  JOVIAL/J3B  instructions  and 
84,065  assembly  language  instructions.  This  system  was 
not  developed  in  modular  fashion. 

The  whole  system  consists  of  a  controls  and  displays 
subsystem,  a  hardware  test  monitor,  two  system  functions, 
and  an  executive  system  which  schedules  the  former  func¬ 
tions.  The  software  consists  of  5  major  functional  areas 
in  the  operational  software  and  two  functional  areas  in 
the  simulation  software.  The  software  was  designed  so 
that,  if  one  Avionic  Control  Unit  breaks  down,  the  sys¬ 
tem  can  still  provide  the  basic  functional  capabilities. 
The  simulator,  which  runs  on  two  separate  computers, 
allows  testing  to  take  place  in  the  laboratory. 

Testing  of  this  software  began  with  Module  Verifi¬ 
cation  Testing  (MVT)  performed  by  each  modules  developer. 
No  Software  Problem  Reports  (SPR's)  were  issued  during 
MVT  because,  as  far  as  configuration  management  is  con¬ 
cerned,  the  software  was  not  released  yet.  Upon  comple- 


2-109 


tion  of  MVT,  the  developers  released  the  modules  for 
formal  testing.  Formal  testing  began  with  Inter-Module 
Compatibility  Testing  (IMCT)  where  the  software  was 
checked  against  its  functional  requirements  as  a  total 
unit.  Upon  completion  of  IMCT,  the  software  development 
group  gave  the  software  system  to  an  independent  system 
test  group  for  System  Validation  Testing  (SVT)  where 
acceptance  testing  for  quality  control  purposes  was  per¬ 
formed.  When  an  error  was  discovered  during  testing, 
the  usual  procedure  was  to  patch  the  program.  Software 
errors  were  documented  on  software  problem  reports  (SPR) 
while  requirement  errors  were  reported  on  Design  Change 
Requests.  The  data  set  obtained  for  this  analysis  was 
from  the  two  formal  test  phases  and  was  both  from  the 
operational  and  simulation  software  for  the  first  two 
versions  (called  blocks)  of  the  software  system. 

Time  to  fix  an  error  was  calculated  based  on  the 
number  of  days  an  SPR  was  open  and  an  assumed  8  hour/ 
day  of  equipment  use  to  fix.  This  8  hours  was  divided 
up  among  the  errors  open  on  any  one  day,  and  this  frac¬ 
tional  time  was  summed  up  over  the  days  the  SPR  was 
open,  to  give  the  final  total  time  spent  fixing  an 
error. 
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The  error  data  set  for  the  analysis  was  collected 
during  the  period  October  1974  to  August  1975  on  a 
monthly  basis.  The  errors  were  categorized  into  the 
following  groups: 

(1)  Critical 

(2)  Low 

(3)  Improvement 

(4)  Medium 

(5)  Other 

The  total  number  of  errors  for  each  severity  level 
over  an  eleven  month  period  and  the  corresponding  esti¬ 
mated  values  of  a  and  b  are  given  in  Table  2.12.  Plots 
of  the  observed  and  fitted  number  of  errors  are  shown 
in  Figures  2.27  and  2.28,  respectively.  Again,  the  model 
appears  to  provide  a  very  good  fit  to  the  failure  data. 


TABLE  2.12 

NUMBER  OF  ERRORS  BY  SEVERITY 


Severity 

Total  Errors 

a 

b 

Critical 

28 

33.40 

0.1657 

Low 

51 

63.20 

0.1495 

Improvement 

211 

260.02 

0.1517 

Medium 

357 

501.99 

0.1129 

Other 

780 

1031.43 

0.1283 

TOTAL 


1427 


1880.71  0.1293 


SECTION  3 


SOFTWARE  FAULT  OCCURRENCE  PROCESS  WITH 
INCREASING/DECREASING  ERROR  DETECTION  RATE 

3 . 1  INTRODUCTION 

As  discussed  earlier,  many  stochastic  models  have 
been  developed  during  the  past  ten  years  to  describe  the 
fault  occurrence  phenomenon  in  a  large  scale  software 
system.  Most  of  these  models  are  based  on  the  assumption 
that  the  time  between  system  failures  follows  an  exponen¬ 
tial  distribution  with  a  parameter  that  depends  either  on 
the  number  of  faults  remaining  in  the  system  or  on  the 
elapsed  execution  or  calendar  time.  A  summary  of  these 
models  and  a  comparative  list  of  the  features  of  some  of 
these  models  was  given  in  section  1.5. 

All  the  models  that  have  been  proposed  to  date  make 
an  important  assumption  about  the  monotonicity  of  the 
software  failure  rate.  In  particular,  it  has  been  assumed 
that  the  software  system  experiences  an  improvement  with 
time.  In  other  words,  the  existing  models  assume  that 
the  software  has  a  decreasing  failure  rate  (DFR) .  However, 
in  practice,  it  has  been  observed  that  many  software  sys¬ 
tems  first  experience  an  increasing  failure  rate  (during 
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the  initial  phases  of  integration)  and  then  follow  a  de¬ 
creasing  failure  rate. 

In  this  section  we  develop  a  new  model  which  incor¬ 
porates  this  dynamic  behavior  of  the  software  systems. 

The  basic  model  is  presented  in  section  3.2  and  various 
software  effectiveness  measures  are  developed  in  section 
3.3.  Software  reliability  and  related  results  are  given 
in  section  3.4.  Methods  for  estimating  the  parameters 
of  the  model  from  software  failure  data  are  described  in 
section  3.5.  Analyses  of  software  failure  data  from  a 
large  scale  system  and  the  Naval  Tactical  Data  System  are 
presented  in  sections  3.6  and  3.7,  respectively. 

Data  sets  from  numerous  other  systems  were  analyzed 
to  assess  the  applicability  of  this  model.  Also,  goodness- 
of-fit  tests  were  conducted  following  the  method  discussed 
in  section  2.6.  Details  of  these  analyses  and  tests  are 
not  reported  here  for  the  sake  of  brevity.  In  all  of  the 
cases  studied  the  model  reported  here  was  found  to  provide 
an  excellent  fit  to  the  observed  failure  history. 
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3 . 2  MODEL  DEVELOPMENT 


In  order  to  develop  an  appropriate  model,  we  study 
the  stochastic  behavior  of  the  fault  detection  phenomenon 
by  focusing  our  attention  on  the  number  of  faults  detect¬ 
ed  by  some  arbitrary  time  t.  Let  N(t)  denote  the  number 
of  faults  detected  by  time  t  and  let  m(t)  be  the  expected 
value  of  N  (t) ,  i.e., 

m(t)  =  E [N (t ) ]  .  (3.1) 

The  above  function  m(t)  is  called  the  mean  value  function 
of  the  N (t)  process.  It  should  be  pointed  out  that  here 
time  t  can  be  calendar  time,  execution  time,  or  any  other 
suitable  and  consistent  measure  of  time.  In  practice, 
however,  we  have  found  calendar  time  and  CPU  time  as  the 
commonly  used  measures. 


3.2.1  Assumptions 


We  now  consider  the  behavior  of  the  software  fault 
detection  process  as  described  by  N(t). 

(i)  There  will  be  no  faults  detected  at  the  begin¬ 
ning  of  the  fault  detection  process,  i.e.,  we 
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have  N(0)  =  0.  Also,  this  implies 

m (0)  =  0.  (3.2) 

(ii)  It  is  quite  obvious  that  the  software  system 
must  contain  a  finite  number  of  faults.  In 
other  words,  if  testing  were  to  be  continued 
indefinitely,  the  number  of  faults  to  be  de¬ 
tected  will  be  finite,  so  that  the  expected 
number  of  faults  to  be  eventually  found  will 
be  m (°°)  .  Let 

m(°o)  =  a  <  °°.  (3.3) 

(iii)  The  faults  to  be  detected  are  such  that  each 
one  effects  the  failure  occurrence  phenomenon 
independently  of  others,  but  the  rate  at  which 
each  fault  causes  the  system  to  fail  depends 
on  elapsed  time.  This  can  be  expressed  by 
taking  the  hazard  rate  z (t)  of  each  fault  to 
be 


z  ( t)  =  bctc  . 


(3.4) 


9  1 
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Note  that  the  shape  of  this  function  will  de¬ 
pend  on  the  values  of  the  parameters  b  and  c. 


3.2.2  Expression  for  m(t) 

Based  on  the  above  description  of  the  fault  detec¬ 
tion  process,  we  now  develop  an  expression  for  m(t).  In 
terms  of  m(t) ,  the  hazard  rate  at  time  t  is  defined  as 


z  (t) 


m(t+At)  -  m(t) 
At{a-m (t) } 


Substituting  for  z(t)  from  Equation  (3.4),  we  get 


m(t+At)  -  m(t)  _  ,__^c-l 

~t(a-mTtn - bct 


(3.5) 


By  letting  At  ->  0  in  the  above  equation,  we  get  a  first- 
order  linear  differential  equation 


m' (t)  +  bctc  ^m(t)  =  abctc  ^ 


(3.6) 


To  solve  the  above  equation  for  m(t),  we  need  to 
use  the  following  results. 
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Lemma ■  If  P(t)  and  Q(t)  are  two  continuous  functions 


of  t,  then  the  general  solution  of  an  equation  of  the 
form 


Y'  +  P(t)y  =  Q(t) 


(3.7) 


is 


Y  =  /Q  (t) h  (t) dt , 


(3.8) 


where 


h  (t )  =  J P(t)dt  . 


(3.9) 


Proposition.  Under  the  boundary  condition  m(0)  =  0, 
the  solution  of  equation  (3.6)  is  given  by 


m  ( t) 


a  (1-e 


-bt 


(3.10) 


Proof .  Let  the  functions  P(t)  and  Q(t)  in  the  above 
Lemma  be 
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1 

« 


P(t)  =  bctc_1 


and 

Q(t)  =  abctc  ^ 


Then  h(t)  is  obtained  from  (3.9)  as 
h (t)  =  e/P(t)dt 


V 


or 


h(t) 


(3.11) 


and 


/Q(t)h(t)dt  =  /  abet 


c-1  bt 
e 


dt 


or 


/Q(t)h(t)dt  =  ae^*"  +  k 


(3.12) 


where  k  is  a  constant  to  be  determined  by  the  boundary 
condition  m(0)  =  0.  Finally,  we  get  the  solution  of 
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(3.6)  by  substituting  (3.]1)  and  (3.12)  into  (3.8), 
i.e.  , 


m(t) 


+  k) 


or 


m  (t)  =  a  +  ke 

Since  m(0)  =  0,  we  have 

m(0)  =  a  +  k  =  0 


(3.13) 


or 


k  =  -a  . 


Substituting  k  =  -a  in  (3.13),  we  get  the  result  of 
Equation  (3.10). 

3.2.3  Fault  Detection  Rate 

Fault  detection  rate  is  the  number  of  faults  per 
unit  time.  Let  A  (t)  denote  the  software  fault  detec- 
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tion  rate  so  that,  for  a  small  time  interval  At,  X(t)At 
represents  the  number  of  software  faults  detected  during 
(t,  t+At) .  Now  m(t)  is  the  expected  number  of  faults 
detected  by  t  and 

X (t)  =  m'  (t)  .  (3.14) 

From  Equations  (3.10)  and  (3.14),  we  get 

,  ...  ,  ,c  -btc  .  c-1 

X (t)  =  abt  r  ct 

or 

X  (t)  =  atY_1  •  e-etY  (3.15) 

where 

a  =  abc 

6  =  b  (3.16) 

y  =  C 

In  order  to  see  the  shape  of  the  fault  detection  rate 
X  (t) ,  we  differentiate  Equation  (3.15)  with  respect  to 
t  and  equate  the  result  to  zero  and  get 


We  see  that,  for  r  >  1,  A (t)  is  a  unimodal  function 


with 


(0)  =  A(-)  =  0  , 


and  its  maximum  value  occurs  at  t  =  t  where 

m 


t  =  (izi)  1/y 

m  1  By 


The  maximum  value  of  A (t)  is 


(3.18) 


:A(t)  =  X(tm)  =  a(^)(Y"1)/Y  •  e"(Y_1)/Y.  (3.19) 


In  other  words,  the  error  detection  rate  of  software  or, 
equivalently,  the  software  failure  rate,  increases  dur¬ 
ing  the  period  (0,t  ),  achieves  its  maximum  value  X (t  ) 

m  m 

at  t  =  t  ,  and  then  decreases  for  t  >  t  eventually  be¬ 
lli  m 

coming  zero  at  t  =  «.  Note  that  if  0  <  y  <  1,  then  the 
software  failure  rate  is  monotonically  decreasing.  From 
the  above  discussion,  we  see  that  the  software  fault  de¬ 
tection  rate  X (t)  is  increasing/decreasing  if  r  >  1,  and 
monotonically  decreasing  if  0  <  y  <  1. 


3- 


The  extreme  values  are  A ( 0 )  =  a  for  y  =  1  and  A ( 0 ) 
for  0  <  y  <  1. 


CO 


3.2.4  Failure  Counting  Process 

Now  we  assume  that  the  failure  counting  process  N(t) 
has  the  following  characteristics: 

(i)  N (t )  has  independent  increments,  i.e., 

(N(t2)  -  N(t^)}  is  independent  of 
{N  (t^)  -  N(t2)}  for  some  t^  <  t2  <  t^. 

(ii)  The  probabilities  associated  with  the  N(t) 
process  are  as  follows: 

!0  with  probability 
1-A  (t) At  +  0 (At) 

1  with  probability  (3.20) 

A  (t)  At  +  0  (At) 

2  with  probability 
0  (At) 

It  is  well  known  that  with  the  above  properties  and  with 
A (t)  as  given  in  Equation  (3.15),  the  N(t)  process  is  a 
non-homogeneous  Poisson  process  (NIIPP)  with  a  mean  value 
function  m(t)  given  in  Equation  (3.10).  Hence,  the  dis¬ 
tribution  of  N(t)  is  given  by 
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3.3  SOFTWARE  EFFECTIVENESS  MEASURES 

In  this  section,  we  develop  expressions  for  several 
useful  quantitative  measures  for  assessing  the  software 
system  effectiveness. 


3.3.1  Distribution  of  the  Number  of  Faults  Detected 
or  Failures  Observed 

As  indicated  above,  N(t)  is  a  NHPP  with  a  probability 
mass  function 


P{  N  ( t )  =  y}  =  {a(— eyI - —  e  a(1"e  ) 


y  0,1,2,... 


(3.22) 


As  t  we  have 

ay  -a 

P{  N  (<»)  =  y)  =  e  ,  y  =  0,1,2,...  (3.23) 

y  • 

This  last  expression  tells  us  that,  if  the  system  were 
to  be  used  for  a  long  time  (t  =  ») ,  the  number  of  faults 
detected  or  failures  observed  during  this  time  follows 
a  Poisson  process  with  mean  'a'. 
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3.3.2  Number  of  Faults  Remaining  in  the  System 


Let  N(t)  denote  the  number  of  faults  not  detected 
by  time  t,  i.e.,  the  number  of  faults  remaining  in  the 
system.  Clearly,  this  number  will  be  obtained  by  sub¬ 
tracting  N(t)  from  N  ( °° )  ,  the  number  of  faults  to  be 
eventually  detected.  Note  that  these  quantities  are 
random  variables.  Thus,  we  have 

N(t)  =  N (“)  -  N (t )  (3.24) 


and 


E  [N  ( t)  ]  =  E  [N  (°°)  ]  -  E  [N  (t }  ] 


or 


E[N(t)  ] 


a  -  a(l-e 


-bt 


or 


E  [N  (t)  ] 


ae 


(3.25) 


C 
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3.3.3  Conditional  Distribution  of  N(t) 


If  we  have  already  observed  y  faults,  it  is  useful 
to  know  the  distribution  of  the  number  of  faults  yet  to 
be  detected.  In  other  words,  the  conditional  distribu¬ 
tion  of  N(t),  given  that  N(t)  =  y,  is 

P{N  (t)  =  x  |  N  ( t)  =  y}  =  =  y>  •  (3.26) 

Now  the  event  N(t)  =  x  denotes  occurrences  over  the  time 
interval  (t,°°)  while  the  event  N(t)  =  y  denotes  occur¬ 
rences  over  the  interval  (0,t),  i.e.,  these  two  events 
represent  non-overlapping  time  intervals.  From  a  basic 
property  of  the  NHPP  process,  such  events  are  independent 
of  each  other,  so  that  we  have 

P{N(t)  =  x | N  (t )  =  y)  =  P(N(t)  =  x},  x=0 ,1,2,...  (3.27) 


or 


P{N (“)  -  N (t )  =  x | N (t)  =  y}  = 

(m  (°°)  -m  (t)  }x  #  -{m(“) -m(t)  } 

x ! 

Or,  substituting  for  m(°°)  and  m(t)  from  Equation  (3.10), 
we  get 
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P{N{°°)  ~  N(t)  =  x|N(t)  =  y}  = 


{a-a(l-e"btC) }x  -{a-a  (l-e-btC) } 
- - -  .  e 


This  yields 


f ,„-btc , x  -btc 

P{N(t)  =  x  |  N  (t )  =  y}  =  x-,  }  •  e"ae 


(3.28) 


Finally,  the  expected  number  of  faults  to  be  detected, 
given  N(t)  =  y,  is 


Eimt)  |  N (t)  =  y]  =  ae' 


(3.29) 


3.3.4  Joint  Counting  Probabilit 


The  property  of  independent  increments,  along  with 
the  equations  developed  above,  provides  a  complete 
statistical  characterization  of  the  NHPP  process  so  that 
the  joint  probability  of  certain  number  of  faults  occur¬ 
ring  in  given  time  intervals  is  obtained  as  follows. 
Consider  times  t1,t2,...,tn  such  that  0  <  <  t2  <  ... 

<  t  .  We  have,  with  tA  =  0 ,  y„  =  0, 


P{N(t1)  =  yv  N(t2)  =  y2,  N(tn)  =  yn: 


n 


=  n  P{ N ( t . )  -  N  ( t .  ,)  =  y . -y .  ,  }  (3.30) 

i=l  x  A 


n  (m (t . )-m(t .  . ) } 

=  n 


yi’yi-l 


i=1  (yi"yi-l)! 


Equation  (3.30)  will  be  used  for  estimating  the  para¬ 
meters  a,  b,  and  c  from  given  failure  data  in  later 
sections . 


*  • 

s 
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3.4  SOFTWARE  RELIABILITY  AND  DISTRIBUTION  OF  TIME 

BETWEEN  SOFTWARE  FAILURES 

The  time  between  failures  is  a  stochastic  process 
whose  behavior  is  governed  by  many  factors  such  as  the 
usage  of  the  system,  system  load,  degree  of  purifica¬ 
tion  of  software,  etc.  However,  since  it  is  not  pre¬ 
sently  feasible  to  quantify  the  effects  of  these  fac¬ 
tors  individually,  we  model  the  process  behavior  as 
described  above,  i.e.,  by  a  NHPP  process  with  an  in¬ 
creasing/decreasing  fault  detection  rate.  At  any  given 
point,  the  time  to  next  failure  will  depend  on  the  time 
when  the  last  failure  occurred.  Suppose  that  the  (k-l)st 
failure  occurred  at  some  time  Sk-1  =  s.  Then  the  prob¬ 
ability  that  the  kth  failure  will  not  occur  for  an  addi¬ 
tional  time  =  x,  i.e.,  the  conditional  probability 
for  time  x,  is  as  follows: 
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Since  the  conditional  cumulative  distribution  function 
(cdf)  is  related  to  conditional  reliability  by 

FX  | s  (xls)  =  1  "  Rx  |s  (xls)  '  (3.32) 

k  k“l  k  k-1 


we  have 


\ivi<xis> 


-a{e_bsC-e"b(s+x)C 
1  -  e  ie  e 


} 


(3.33) 


The  conditional  probability  density  function  (pdf)  is 

obtained  from  Equation  (3.33)  by  differentiating 

F  |  (x|s)  with  respect  to  x  and  is  given  by 

xk 1 sk-l 


fx  |s  (x|s»  .  ab=(s+x)0-le-b<s+x»Ce-a(e'bSC-e'b(S+X,Cl 
Xk 1 Sk-1 

(3.34) 


Finally,  we  are  also  interested  in  the  joint  pdf  of  the 
cumulative  times  to  failures,  i.e.,  in  the  joint  pdf  of 
SlfS2, . . . ,Sn.  Following  the  approach  given  in  Section  2, 
we  get 


f 


S 


1' 


-m(s  )  n 

, s  )  =  e  n  n  X  (s.  )  ,  (3.35) 
n  k=l 


where 
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3.5  ESTIMATION  OF  MODEL  PARAMETERS  FROM  FAILURE  DATA 


In  this  section,  we  describe  methods  for  estimat¬ 
ing  the  parameters  a,  b,  and  c  (or,  equivalently,  a,  B, 
and  y)  from  available  data  on  software  failures.  Such 
data  are  generally  available  either  as  cumulative  number 
of  failures  in  given  time  intervals,  or  as  times  betweert 
software  failures.  The  estimation  procedure  is  differ¬ 
ent  for  each  case  and  is  described  below.  In  this  re¬ 
port,  we  use  the  method  of  maximum  likelihood  for  esti¬ 
mation  purposes. 


3.5.1  Maximum  Likelihood  Estimation  When  Data  on 
Cumulative  Software  Failures  are  Given 

Let  y ^  be  the  number  of  failures  observed  during  a 
time  interval  (0,t^),  y2  during  the  interval  (0,to),  and 
so  on.  In  general,  let  y^  be  the  number  of  failures  by 
time  t^.  Then  the  observed  data  in  this  case  will  con-' 
sist  of  apirs  (t^,y^),  i  =  l,2,...,n.  Now  the  probability 
of  observing  (y^-y^_^)  failures  during  a  time  interval 
(ti~t^_1)  is  given  by  (see  Equation  (3.30)), 


P{N(t.)-N(t._1)  =  y.-y  •_!> 

y  *  -y  • 

{m(ti)-m(ti_1)>  1  1"1 

=  Mi-^1 


(3.39) 


{-m(ti)-m(ti_1) } 
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Since  the  increment  in  the  number  of  failures  during 
the  non-overlapping  time  periods  (0,tp,  (t^^tj)/...# 
(ti_i#ti) , . . . ,  (tn_i/tn) ,  are  independent  of  each  other, 
the  joint  probability  of  the  pairs  of  observations 

(tl'y3.)  '  '  *  *  * '  ( »Yn)  •  P  { N  ( tp  =  y1#  N  (t2 )  —  y  2' 

...,N(t.)  =  yi,...,N(tn)  =  yn>  ,  can  be  written  as 

P{N(tp  =  y1}/  POK^-tp  «  y^yp,..., 

p(N(ti-ti.1)  =  . 


PlN(Vtn-l>  '  ^n-yn-l1 


n 


n 


n  (m(t . ) -m(t .  p  }Yx  yi-1  "i^1{m(ti) } 

iii — - e  ' 


or 


P{N (tp  =  yx,N(t2)  =  y2,...,N(tn)  =  yR} 


y  *  -y  • 

n  {m(ti)-m(ti  1)  }  1  11  -m(tn) 

=  i!i - fri-yjp  *  6 


(3.40) 
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From  this,  the  likelihood  function  for  parameters 
a,  b,  and  c,  corresponding  to  the  observations  (ti,yi), 
i  =  l,2,...,n,  is  obtained  as 

L (a,b,c  |  (t^,y^),  (t2»y2)  /  •  •  • »  ' yn ^  ^ 


y  •  -y  • 

n  {m(t.  )-m(t.  , )}  1  1-1  -m(t  ) 


Taking  the  natural  logarithm  on  both  sides  of  Equation 
(3.41),  the  log  likelihood  is  obtained  as 


£(a,b,c|  ( t^ t y^) , i— 1 ,2 ,...,n)  =  £n  L(a,b,c|(t^,y^), 


l  1,2, ... ,n) 


.f|yi“yi-l) An{m(ti)-m(ti_1) 


■  ra(tn)  “  .^(y.-y.^n 


(3.42 


On  substituting  for  m(t.  .),  m(t.),  and  m(t  )  from  Equa¬ 
ls  i  n 

tion  (3.10),  and  simplifying,  the  log-likelihood  function 
becomes 


i  (a,b,c|  (t^y^ ,  i  =  1,2, .. .  ,n) 


n 


38  £  (Yi-yi.T )  ^n{a(e 

i=l  1  11 


-bt i_x  -bt l 

1  1  -  e  1) } 


_bth  n 

-  a (1-e  )  -  Z  *n(y.-y.  . ) ! 

i=l  111 


(3.43) 


It  is  well  known  that  the  maximum  likelihood  esti- 

A  A  A 

mates  (mle's)  a,  b,  and  c,  are  those  values  of  a,  b, 
and  c,  respectively,  that  maximize  the  likelihood  func¬ 
tion  given  in  equation  (3.42),  or  equivalently,  are 
those  values  that  maximize  the  log  likelihood  function 

A  A  A 

of  Equation  (3.43).  Thus,  a,  b,  and  c  are  those  values 
that  simultaneously  satisfy  the  following  equations: 

(3.44) 

(3.45) 

(3.46) 


11 

3a 


£4=0 


3 1 
3b 


£4  =  o 


11 

3c 


=  0 


On  taking  the  derivatives  of  Equation  (3.43)  with  re¬ 
spect  to  a,  b,  and  c,  and  substituting  in  Equations 
(3.44),  (3.45),  and  (3.46),  respectively,  we  get  the 


3-24 


following  non-linear  simultaneous  equations 


Yn  -  Ml  -  e  “)  , 


_  -bt° 


-bt? 


-bt? 


n  (yj-yj-!) (t?e  1  -  t|_je  1 
i=l  -bt?  ,  -bt? 


-btj_1  -btf 
(e  1  -  e  *) 


at  (£n  t  )e 
n  n 


-bt? 


n  (y^-y<_i )  {t^Unt,  )e  1  -  t?_ , '( 


The  set  of  simultaneous  equations  (3.47),  (3.48),  and 
(3.49)  can  be  solved  numerically  for  a,  b,  and  c.  Th 
solution  will  be  the  required  maximum  likelihood  esti 

A  ^  A 

mates  a,  b,  and  c  of  a,  b,  and  c,  respectively. 


Once  the  estimates  of  a,  b,  and  c  have  been  obtain¬ 


ed  from  the  data,  the  performance  measures,  as  derived 
in  sections  3.3  and  3.4,  can  be  easily  computed  by  sub- 

A  A  A 

stituting  a,  b,  and  c  for  a,  b,  and  c,  respectively. 

In  order  to  obtain  confidence  bounds  on  the  performance 
measures,  we  need  to  know  the  distribution  of  the  esti- 

A  A  A 

mates  a,  b,  and  c.  For  a  reasonably  large  sample  size 
n,  say  n  >_  20,  the  maximum  likelihood  estimators  general¬ 
ly  follow  a  normal  distribution.  Thus  the  vector  (a  b  c) 
will  have  a  trivariate  normal  distribution  (TVN)  with 

(a  b  c) '  as  the  vector  of  means  and  £  as  the  variance- 

cov 

covariance  matrix.  In  other  words,  for  large  n. 


The  variance-covariance  matrix  £  represents 

cov 


£ 


COV 


(Var{a)  Cov(a,b) 
Cov(b,a)  V(b) 
Cov(c,a)  Cov(c,b) 


(3.51) 


and  is  given  by 


(3.52 


where 


-■1^1 

Vj 


i»j  =  a,  b,  c  . 


(3.53 


Thus,  to  obtain  2 _  ,  we  first  take  the  derivates  of 

cov 

Equation  (3.43)  and  then  the  expectations  of  the  result¬ 
ing  expressions,  ad  indicated  by  Equation  (3.53).  On 
so  doing,  we  get  the  following  expressions  for  various 

r.  .'s,  i, j  =  a,  b,  c. 

A  t  J 


r 


aa 


1 

a 


n 

I 

i=l 


(e 


(3.54 


rab  =  rba 


(3.55! 


r__  =  r__ 


bt^  (Jtnt_ )  e 


-bt 


c 

n 


(3.56 


The  variance-covariance  matrix  E  is  obtained  by  sub- 

U  U  V 

stituting  the  appropriate  values  from  Equations  (3.54) 
to  (3.59)  into  Equation  (3.53).  Confidence  bounds  on 
the  performance  measures  can  then  be  computed  by  using 
the  properties  of  a  trivariate  normal  distribution. 


3.5.2  Maximum  Likelihood  Estimation  of  Parameters  When 


Data  on  Times  Between  Software  Failures  are  Given 


Sometimes  failure  data  are  given  as  a  sequence  of 
failure  times  s^,s2,...,sn  where  s^,  k  =  l,2,...,n,  rep¬ 
resents  the  time  of  the  kth  failure.  Using  the  joint 
density  of  SlfS2, . . . , Sn,  as  given  in  Equation  (3.35), 
the  likelihood  function  of  a,  b,  and  c  for  given  data 
r  s 2 1  •  •  •  /  sn  is 


L (a,b,c  s^,s2,...,s^) 


=  e"a(1“e  n>  n  (abc) {s?-1e 
k=l  K 


(3.60) 


As  before,  the  maximum  likelihood  estimates  are 
those  values  which  maximize  the  likelihood  function  of 
Equation  (3.60).  Since  maximizing  the  likelihood  is 
equivalent  to  maximizing  the  log-likelihood  function, 
we  take  the  natural  logarithm  of  Equation  (3.60)  and 
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f 


£  (a, b,c | S2'S2 '  * '  *  ,sn)  ~  ^(a,b,c  I  si,s2'sn^ 

n 

=  n£na  +  n£nb  +  nine  +  (c-1)  E  £ns, 

k=l  * 

n  -bsc 

-  b  Z  s?  -  a  (1-e  n)  .  (3.61) 

k=l  K 

Then,  the  rale's  are  those  values  a,  b,  c  which  satisfy 
the  following  equations: 


o 

II 

to  l<ro 

(3.62) 

H  =  o 

3b  0  ' 

(3.63) 

f£  =  o  . 

(3.64) 

On  taking  the  derivatives  of  Equations  (3.61), 
(3.62),  (3.63),  and  (3.64),  respectively. 


~bs„ 

n  =  a  (1-e  n)  , 


(3.65) 


n  -bs 

n  =  b{  z  s?  +  asce  n}  , 
k=l  K  n 


(3. 66) 


and 
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The  above  simultaneous,  non-linear  equations  can  be 
solved  numerically  to  get  the  maximum  likelihood  esti- 

A  A  A 

mates  a,  b,  and  c. 

Since  the  joint  distribution  of  (S1,S2, . . . ,Sn>  is 
an  improper  distribution,  as  discussed  in  section  2, 
the  asymptotic  properties  of  the  mle's  do  not  hold  in 


this  case 


3.6  ANALYSIS  OF  FAILURE  DATA  FROM  A  LARGE  SCALE 

SOFTWARE  SYSTEM 

Failure  data  generated  during  formal  testing  of  a 
large  scale  software  system  [THA76 ]  was  analyzed  in 
section  2.8  using  a  two  parameter  non-homogeneous  Poisson 
process  model.  In  that  analysis,  the  data  from  the 
first  9  of  the  24  weeks  of  testing  had  to  be  dropped 
because  during  this  period  the  system  exhibited  an  in¬ 
creasing  failure  rate.  The  model  developed  in  this 
section  is  capable  of  modelling  an  increasing/decreas¬ 
ing  failure  rate  and  will  be  employed  to  develop  a 
model  for  the  failure  data  over  the  entire  24  week 
testing  period. 

The  number  of  failures  per  week  for  the  four  data 
sets  are  given  in  Table  3.1  and  a  plot  for  data  set  DS1 
is  shown  in  Figure  3.1.  It  is  readily  seen  that  the 
failure  rate  increases  for  about  the  first  nine  weeks 
and  then  decreases  until  the  end  of  testing. 

3.6.1  Estimation  of  Parameters 

The  data  are  given  in  the  form  of  point  (t^,y^), 
i  =  1,2,..., 24,  where  t^  and  y^  refer  to  time  in  weeks 
and  y^  is  the  number  of  failures  in  week  i.  To  esti- 
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TABLE  3.1.  Software  Failure  Data  From  Thayer  et  al.  [THAT76 ] 


2118  59  2510  111  4044  107  3652 
2180  84  2594  253  4297  231  3883 
2191  27  2621  70  4367 _ 54 _ 3937 


mate  the  parameters  a,  b,  and  c,  we  use  the  method  of 
section  3.5.1  and  substitute  the  data  values  for  each 
set  in  Equations-  (3.47),  (3.48),  and  (3.49).  The  esti 

A  A  A 

mates  a,  b,  and  c  for  the  four  sets  are  given  in  Table 
3.2.  and  the  fitted  mean  value  functions  for  the  four 
data  sets  are  as  follows 


DS1: 

DS2 : 

DS3 : 

DS4 : 


m(t) 

m(t) 

m(t) 

m(t) 


1  494 

2352(1  -  e-0232t  ) 

2873(1  -  e--0182t1,540) 

_  m  1E4.1* 547 
5182(1  -  e  *0135t  ) 

4657(1  -  e  ,0156t  ) 


A  plot  of  the  cumulative  number  of  observed  soft¬ 


ware  failures  is  given  in  Figure  3.2  and  the  expected 
cumulative  number  of  failures  (m(t))  for  each  data  set 
are  shown  in  Figure  3.  3. 


TABLE  3.2 

A  SUMMARY  OF  DATA  ANALYSIS  FOR  DS1-DS4 


TIME  (WEEKS) 


8 


FIGURE  3.2.  Plots  of  the  Cumulative  Number  of 
Software  Failures  for  DS1  to  DS4. 


EXPECTED  NUMBER  OF  S/W  ERRORS 


3.6.2  Confidence  Bounds 


By  using  normal  approximation  to  a  Poisson  in  Equa¬ 
tion  (3.22)  we  compute  the  90J  confidence  bounds  for  the 
N (t)  process.  The  estimated  mean  value  function  and  90% 
bounds  for  data  set  DS1  for  the  N(t)  process  are  plotted 
in  Figure  3.4.  From  this  figure  we  see  that  most  of 
the  observed  points  fall  within  9055  bounds  implying  that 
the  model  described  in  Section  3.2  fits  the  entire  his¬ 
tory  of  software  errors  very  well.  The  number  of  re¬ 
maining  errors  at  t  =  24  (weeks) ,  given  that  2191  errors 
were  found  by  this  time,  is  estimated  from  Equation  (3.25) 
and  we  have 


E [N (24 ) | N (24 )  =  2191]  =  2352e 


-.0232 (24) 


1.494 


=  161.9. 


Note  that  a  total  of  198  errors  were  detected  during  the 
one  year  period  of  operational  demonstration  so  that  the 
predicted  number  is  close  to  the  actual  value.  The 
variance-covariance  matrix  is  obtained  from  Equation  (3.51) 
and  is 


cov 


3067 

0.0191 

-0.627 


0.0191 

3.53x10 


-6 


-6.09x10 


-0.627 


-5 


-6.09x10 

1.25xl0“3 


-5 
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o>  0’>0; 


TIME  (WEEK) 


FIGQRE  3.4.  Estimated  Mean  Value  Function  and  90 % 
Confidence  Bounds  for  the  N(t)  Process 


From  this  matrix  we  obtain  the  estimated  standard  devi¬ 
ations  and  the  appropriate  correlation  coefficients  for 

A  <*.  A 

a,  b,  and  c.  These  values  for  data  sets  DS1  to  DS4  are 
also  shown  in  Table  3.2.  By  using  the  above  variance- 
covariance  matrix  we  can  also  obtain  100(l-a)$  confidence 
bounds  for  EN(t)  which  are  given  by 

{f  (a,b,c)  ±  tn_3;a/2/  Var(f (a,b,c,))  } 


where 


Var{ f (a,b,c)  } 


/3f  3 f  3 f  \ _ 

'■Sa  3b  3c'  cov 


3f 

3a 

3f 

3b 

3f 

3c 


a=a 

b=b 

c=c 


For  this  case  we  have 


3f 

3a 

3f 

3b 

3f 

3c 


-bt 


-atce-bt 


-abtc(tnt)e"bt 


I 
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The  90%  confidence  bounds  for  EN(t)  for  data  set  DS1  are 
computed  from  the  above  equations  and  are  shown  in  Figure 

3.5.  Also  shown  is  a  plot  of  the  actual  number  of  re¬ 
maining  errors  during  the  24  week  period.  From  this  fig¬ 
ure  we  see  that  the  actual  errors  fall  within  the  90% 
bounds . 

Similarly,  by  setting 

A  A 

f (a,b,c)  =  R  ,q  (x|s) 
xk 1 bk-l 

A  A 

-;{e-^c  -  e4{s+x)C} 

=  e 

* 

we  can  estimate  the  software  reliability  for  given  de¬ 
bugging  time  s. 

The  100(l-a)%  confidence  bounds  on  Rv  ic  (x|s) 

Xk|bk-1 

can  be  obtained  as  for  EN (t) .  The  reliability  plots 
and  90%  confidence  bounds  for  DS1  are  shown  in  Figure 

3.6. 
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RELIABILITY 


3.7  ANALYSIS  OF  FAILURE  DATA  FROM  NAVAL  TACTICAL 
DATA  SYSTEM  (NTDS) 

Failure  data  from  NTDS  were  analyzed  in  Section  2.7 
using  a  two  parameter  NHPP  model.  In  this  section  we 
reanalyze  the  same  data  by  using  the  three  parameter 
NHPP  model  of  Section  3.2.  (For  details  of  the  system 
and  data  set,  see  Section  2.7.)  Data  analysis  using 
the  Newton-Raphson  method  for  solving  the  likelihood 
estimates  of  a,  b,  and  c  based  on  the  first  26  failures 
of  Table  2.1,  we  get  a  =  27.2,  b  =  0.000783,  and  c  = 

1.50  so  that 

m(t)  =  a(l-e~^tC) 

=  27.2(l-e-°-000783t1'5)  . 

The  bounds  of  the  N(t)  process  can  be  obtained  by  using 
normal  approximation  to  a  Poisson  distribution  of  Equa¬ 
tion  (3.22).  The  estimated  mean  value  function  and  90% 
bounds  of  the  N(t)  process  for  this  data  set  are  shown 
in  Figure  3.7.  Also  shown  is  a  plot  of  the  actual  num¬ 
ber  of  errors  detected  by  time  t.  From  this  figure  we 
see  that  all  the  data  points  fall  within  the  90%  bounds. 
We  can  estimate  the  expected  number  of  errors  remaining 
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OF  S/W  ERRORS 


NTDS ( N  =  2  6 ) 


at  time  t  by  substituting  the  mle's  in  Equation  (3.25), 
i.e. , 


EN  (t ) 


or 


EN  (t ) 


27.2e-°-000783t1'5 


Thus  for  t  =  250, 


EN (250)  =  1.23  . 


That  is,  we  can  expect  one  more  error  remaining  at  t  = 
250  (days) .  The  conditional  reliability  of  the  time  to 
the  next  (27th)  failure,  given  S2g  =  250,  is  computed 
as 


c  (x | 250 ) 
b26 


^{e-b(250c)_e-b(250+x)c} 

27.2{0.0453-e-0-000783<250+x)1'5}. 


For  the  values  of  x  =  10,  20,  and  50  (days)  the  reli¬ 
ability  values  are  0.81,  0.68,  and  0.46,  respectively. 
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SECTION  4 


OPTIMUM  SOFTWARE  RELEASE  TIME 

4 . 1  INTRODUCTION 

An  important  objective  of  developing  the  models  in 
Sections  2  and  3  was  to  provide  an  analytical  framework 
for  estimating  software  performance  measures  which  are 
needed  for  making  various  decisions.  An  important  de¬ 
cision  of  practical  concern  is  the  determination  of  the 
time  when  testing  can  stop  and  the  system  be  considered 
ready  for  release,  that  is,  the  determination  of  the 
software  release  time. 

The  operational  performance  of  a  software  system  is 
to  a  large  extent  dependent  on  the  time  spent  in  testing. 
The  longer  the  testing  phase,  the  better  the  performance. 
Also,  the  cost  of  fixing  an  error  is  generally  much  less 
during  testing  than  during  operation.  However,  the  time 
spent  in  testing  delays  the  release  of  the  system  for 
operational  use  and  incurs  additional  cost.  This  suggests 
a  reduction  in  test  time  and  an  early  release  of  the  sys¬ 
tem.  In  this  section,  we  consider  these  conflicting  ob¬ 
jectives  in  the  determination  of  the  optimum  release  time. 
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In  Section  4.2  we  consider  the  release  time  prob¬ 
lem  based  on  a  reliability  criterion  using  the  model  of 
Section  2.  Cost  based  optimum  release  time  policies 
are  developed  in  Sections  4.3  and  4.4  when  the  failure 
phenomenon  follows  a  non-homogeneous  Poisson  process. 

The  policy  in  Section  4.3  uses  the  model  of  Section  2 
while  the  policy  in  Section  4.4  is  for  the  failure  model 


of  Section  3. 


4.2  SOFTWARE  RELEASE  TIME  BASED  ON  RELIABILITY  CRITERION 


For  a  non-homogeneous  Poisson  process  failure  model, 
the  conditional  reliability  at  operational  time  x,  given 
that  the  testing  has  proceeded  for  S  =  s  time  units,  is 
given  by 

Rx|s(x|s)  =  R  =  exp [-a (e-bs  -  e"b (s+x)  )]  (4.1) 


or 


R  =  exp  t-m(x)e~bs] , 


(4.2) 


where 


m(x)  =  a(l  -  e"bx)  .  (4.3) 

One  commonly  used  criterion  is  to  stop  testing  when 
the  predicted  reliability  at  a  specified  time  x  equals 
some  given  value.  Then  the  problem  reduces  to  solving 
(4.2)  to  find  the  value  of  s  that  satisfies  this  cri¬ 
terion. 

Taking  the  logarithm  of  both  sides  of  (4.2)  and 
rearranging  yields 
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s 


(1/b) [£n  m(x)  -  £n  an  1/R] 


(4.4) 


for  the  software  system  under  test.  In  (4.3)  and  (4.4), 
a  and  b  are  estimated  from  previous  data  and  R  and  x  are 
the  specified  values.  Therefore,  the  required  testing 
time  s  can  be  easily  determined. 

For  illustration  purposes,  consider  the  failure 
data  DS1  discussed  in  Section  2.8.  For  this  data  set 
a  =  1348  and  b  =  0.124.  Suppose  it  is  desired  that  the 
testing  be  continued  until  the  operational  reliability 
at  x  =  0.1  equals  0.70.  From  (4.4), 

s  =  (l/0.124)Un[1348  (l-e"°*124(0'1))) 

-  Jin  Jln(l/0.7)  }, 

or  s  =  31  weeks. 

In  other  words,  31  weeks  of  testing  will  be  needed 
before  the  system  can  be  released  to  assure  the  desired 
reliability. 

To  see  the  effect  of  s  on  R(x|s),  plots  of  reli¬ 
ability  versus  bs  for  m(x)  =  5(5)50  are  shown  in  Figure 
4.1.  We  note  that,  as  the  testing  time  s  is  increased, 
while  keeping  x,  and  hence  m(x),  constant,  R(x|s)  in- 
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4.3  OPTIMUM  RELEASE  TIME  BASED  ON  COST  CRITERION 
(MODEL  OF  SECTION  2) 

To  determine  the  optimal  policy,  we  first  develop 
a  cost  model  and  then  solve  it  to  get  the  desired  re¬ 
sult. 


4.3.1  Cost  Model  and  Optimal  Policy 

Let 

c^  =  cost  of  fixing  an  error  during  testing, 
c2  =  cost  of  fixing  an  error  during  operation 
(c2  >  c^  , 

c3  =  cost  of  testing  per  unit  time, 
t  =  software  life-cycle  length,  and 
T  =  software  release  time  (same  as  testing 
time) . 

Since  m(t)  represents  the  expected  number  of  errors  dur¬ 
ing  (0,t),  the  expected  costs  of  fixing  errors  during 
the  testing  and  the  operational  phases  are  c^m(T)  and 
c2(m(t)  -  m(T)],  respectively.  Further,  the  testing 
cost  during  T  is  c^T. 


Combining  the  above  costs,  the  total  expected  cost 


is  given  by 

C  (T,  t )  =  C  (T)  =  CjmfT)  +  c2  [m(t)-m(T)  ]  +  c3T.  (4.5 
These  costs  are  also  shown  in  Figure  4.2. 

* 

Our  objective  is  to  find  the  optimum  value  T  that 
minimizes  (4.5).  Differentiating  (4.5)  with  respect  to 
T,  we  get 


dC(T)/dT  =  c1m' (T)  -  c2m' (T)  +  c3  .  (4.6) 

Equating  the  right-hand  side  of  (4.6)  with  zero  and  not¬ 
ing  that  X (T)  =  m' (T) ,  we  get 

X  (T)  =  c3/(c2  -  c1)  ,  (4.7) 


where 


X (T)  =  abe"bT  .  (4.8) 

Note  that  X (T)  is  a  monotonically  decreasing  function  of 
T  and  X (0)  *  ab.  If  ab  £  c2/(c2  -  c^)  ,  (4.7)  has  no 
feasible  solution  and,  for  T  >_  0 ,  dC(T)/dT  >  0  (see 


Figure  4.3.  Hence,  for  this  case,  the  minimum  of  C(T) 

* 

is  at  T  =  0;  that  is,  T  =0. 

Now,  if  ab  >  c^/  (C2  -  ) ,  there  exists  a  unique 

feasible  solution  of  (4.7)  given  by  (see  Figure  4.3) 

ab  (c~  -  c, ) 

Tq  =  (1/b)  n  [ - - — ]  .  (4.9) 

Since  dC(T)/dT  <  0  for  0  <  T  <  Tq  and  dC(T)/dt  >  0  for 
T  >  Tq ,  the  minimum  of  C (T)  is  at  T  =  Tq  for  TQ  £  T  and 
at  T  =  t  for  TQ  >  t.  These  can  be  summarized  as  follows 

Theorem  4.1.  (i)  If  ab  >  c3/(c2  -  c^)  ,  then  there 
exists  a  unique  feasible  solution  of  (4.7)  and  the  opti¬ 
mum  release  time  is 

* 

T  =  min{T0,t}  , 
where  Tq  is  given  by  (4.9). 

* 

(ii)  If  ab  <_  c2/(c2  -  c^)  ,  then  T  =  0. 

It  should  be  noted  that,  if  the  minimum  expected 
cost  exceeds  the  operational  benefit  to  be  ained,  no 
testing  should  be  undertaken. 


To  illustrate  the  above  results,  consider  the  data 


set  mentioned  in  Section  4.2.  Here  a  =  1348  and  b  = 
0.124.  Let  c^  =  1,  =  5,  c^  =  100,  and  t  =  100. 

Then  ab  =  1348(0.124)  =  167  and 

c 3/  (c2  “  c1 )  =  25  . 

Since  ab  >  c^/ (c^  ~  c^) ,  the  optimum  release  time 
T*  =  mint  (1/0. 124) £n  (167/25) ,  100} 


or 


T  =  mint  15 . 3 , 100 ]  =  15.3  . 

Hence,  the  optimum  solution  for  this  case  is  to  allo¬ 
cate  15.3  weeks  for  testing  and  84.7  weeks  for  opera¬ 
tion.  The  cost  associated  with  this  policy  will  be 
C (T* )  =  3687. 

* 

4.3.2  Sensitivity  Analysis  of  T 

Now  we  investigate  the  effects  of  the  parameters 
a,  b,  and  cr  :  c^/(c2  -  c^)  on  the  optimum  release  time. 
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* 

First,  from  Theorem  4.1,  we  note  that  T  equals  0,  t, 

*  * 

or  Tq.  Since  T  =0  and  T  =  t  are  degenerate  cases, 

* 

we  shall  consider  only  the  case  when  T  =  Tq . 

From  (4.9),  we  see  that  Tq  increases  logarithmical¬ 
ly  with  a  and  decreases  logarithmically  with  cr  as  others 
are  kept  constant.  Next,  the  first  and  second  deriva¬ 
tives  of  Tq  with  respect  to  b  indicate  that  T  is  a  con¬ 
cave  function  of  b  with  maximum  at  b~  =  ec  /a,  and  the 

0  r 

maximum  value  of  Tq  is  l/bg . 

In  practice,  for  a  given  software  system,  the  value 

of  a  prior  to  testing  is  fixed  and  one  may  be  interested 

in  the  joint  effect  of  b  and  c  on  T_ .  The  value  of  b 

r  u 

can  be  affected  by  an  appropriate  selection  of  testing 
strategies  and  techniques.  For  the  data  set  discussed 
earlier,  a  =  1348.  For  this  case,  contours  of  Tq  in 
the  b-cr  plane  are  shown  in  Figure  4.4.  Also  shown  is 
the  optimum  value  of  T  corresponding  to  the  above  numer¬ 
ical  example.  This  diagram  can  also  be  used  to  deter¬ 
mine  the  value  of  b  if  Tq  is  fixed  due  to  some  other 
considerations.  Thus,  if  c  =25,  and  Tn  =  15,  we  need 
b  =  0.13.  If,  however,  TQ  =  10 ,  b  must  be  0.265. 


r~ 


r 


r 


r 


r 
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4.4  OPTIMUM  RELEASE  TIME  BASED  ON  COST  CRITERION 

-  1 

(MODEL  OF  SECTION  3) 

m 

4.4.1  Cost  Model  and  Optimal  Policy 

: - 

The  cost  model  for  this  case  is  similar  to 

that 

-  - 

m 

given  in  Equation  (4.5)  and  is  (quantities  are  as  de- 

fined  in  Section  4.3.1) 

..  •  - 

ii 

c  (T,  t )  =  c1M(t)  +  c2(M(t)  -  M  (T)  )  +  C;JT 

(4.10) 

•  -w- 

;■  • 

where 

I 

»  ,  . 

w' 

-bxc 

M(x)  =  a(l  -  e  DX  )  . 

(4.11) 

On  differentiating  (4.10)  with  respect  to  T  and 

equating 

;• ' 

the  result  to  zero,  we  get 

« 

C3 

A  (T)  £  M'  (T)  =  - - ~r~  , 

C2  C1 

(4.12) 

where 

A (T)  =  aT7"1  •  e_eTY 

(4.13) 

V' 

•' 

a  =  abc 

-  -  -  •  -1 

•  '  . 

3  =  b 

. 

• 

Y  =  C 

r  - 

-  •  — 1 
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V  1 

t 

To  solve  (4.12)  for  T,  we  consider  three  cases  depend¬ 
ing  on  the  value  of  y,  viz  y  >  1,  y  =  1,  and  0  <  y  <  1. 

Case  When  y  >  1 

For  this  case,  the  failure  distribution  has  an  in¬ 
creasing  failure  rate  followed  by  a  decreasing  failure 
rate.  Then  we  see  from  (4.13)  that  A (T)  is  a  unimodal 
function  of  T  with  A(0)  =  A(°0  =  0.  Also,  its  maximum 

A (T  )  occurs  at  T  where 
m  m 

T  =  (^)1/Y  (4.14) 

m  By 


and 


A  (T  )  =  a  (|-^)  (Y  1)/Ar 
'  m  3ye 


(4.15) 


If  A  (T  )  <  - - - 

'  m  c_  -  c 


and  A (T)  < 


(c2  ‘  cl> 


for  T  >  0 ,  then 


2  ~1 

it  is  easy  to  see  that  Equation  (4.12)  has  no  feasible 

dC ( T  t ) 

solution  for  T.  Therefore,  for  T  >  0,  — —  >  0,  and 
the  minimum  of  C(T;t)  is  at  T  =  0.  In  other  words,  if 


A  (T  )  <  - — 

'  m  c-  -  c 


and  A (T)  < 


,  then  T  =0.  This 


c2  C1 


2  1 

is  shown  graphically  in  Figure  4.5. 
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If  A (T  )  = 
m 


'(c,  -  C  ,)■'  ar  c'T--t>  =  ltM,  =  V 


then  C(T;t)  >  0  for  0  <  T  <  T  and  for  T  >  T  .  Then 
dT  m  m 


Equation  (4.12)  has  a  unique  feasible  solution  T  =  T^. 


However,  T  =  is  an  inflection  point  of  C(T;t)  for 


this  case.  Therefore,  the  minimum  of  C(T;t)  is  at  T  =  0, 


i.e.,  T  =  0  as  seen  in  Figure  4.6. 

c.. 


If,  however,  A (T  )  >  - 

m  c~  -  c. 


,  there  exist  two  fea¬ 


sible  solutions  T  =  T^  and  T  =  T2 ,  0  <  T^  <  T2  <  “ ,  which 


are  the  two  positive  roots  of  Equation  (4.12).  Also, 


A  (T)  < 


C2  ~  C1 


0  £  T  <  T^  T  >  T2 


and 


.'-.I 


A(T)  >  j-'c"  '  T]_  <  T  <  T2  . 

For  this  case,  T^  and  T2  can  be  obtained  by  solving 
Equation  (4.12)  numerically.  It  should  also  be  point¬ 
ed  out  that  dC^;t)  >  0  for  0  <  T  <  Tx,  T  >  T2,  and 
dC (T • t ) 

— — —  <  0  for  T^  <  T  <  T2.  In  this  case,  we  consider 
the  minimum  of  C(T;t)  for  the  following  three  cases 
(see  Figure  4.7). 


=  (iiI)Vy 
'  By 


III  c 

Fig.  4.7,  Plots  of  C ( T ; t)  and  A(T)  for  y  >  1  and  y  >  7*-—, 


Case  A. 


is  at  T 


If  C(0;t)  >  C (T2 ; t ) ,  then  the  minimum  of  C(T;t) 
=  T2  for  t  >_  T2  and  is  at  T  =  t  for  t  <  T0. 


Case  B.  If  <^(0;t)  <  C(T2;t),  then  the  minimum  of  C(T;t) 
is  at  T  =  0  / 


Case  C.  l/f  C(0;t)  =  C(T2;t),  then  the  minimum  of  C(T;t) 
is  at  0  or  T 2  for  t  >_  T2  and  is  at  T  =  0  for  t  "  T~ . 


Case  When  7  =  1 

-  BT 

For  this  case,  A  (T)  =  ne  where  <x  =  ab  and  B  =  b. 

Now  A (T)  is  a  monotonically  decreasing  function  of  T  and 

C3 

A (0 )  =  u.  If  a  <  - — — ,  (4.12)  has  no  feasible  solu- 

c2  ~  cl 

dC (T*  t ) 

tion  and  — -pp — -  >  0  for  T  >_  0.  Therefore,  the  minimum 

% 

of  C  (T ; t )  is  at  T  =  0,  i.e.,  T  =0.  If,  however, 

C3 

u  >  - ,  then  there  exists  a  unique  solution  of  (4.12) 

C2  '  C1 
given  by 


T 


0 


1 

B 


*X  (  Cn  C  -|  ) 

£n{  - 

C3 


(4.16) 


From  the  fact  that  —  '  0  for  0  <  T  ■'  T  and  1- 

dT  -  0  dT 

>  0  for  T  -•  Tq  ,  the  minimum  of  C(T;t)  is  at  T  =  for 
t  >  Tq  and  at  T  =  t  for  t  <  Tq . 
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Case  When  0  <  y  <  1 


For  this  case,  A (T)  is  a  monotonically  decreasing 
function  of  T  with  A(0)  =  «>  and  A  (°°)  =  0.  Then  the  unique 
positive  root  of  Equation  (4.12)  is  the  solution.  Fur¬ 
ther,  since  <  0  for  0  <_  T  <  T3  and  dC^;t)  >  0 

for  T  >  T^,  the  minimum  of  C(T;t)  is  at  T  =  for  t  <_ 

and  at  T  =  t  for  t  <  as  seen  in  Figure  4.G.  We  can 

summarize  the  above  results  in  the  following  theorem. 

Theorem  4.2.  Suppose  that  a,  g,  y,  c . ,  c2(>  c^) ,  and  c^ 

are  all  greater  than  zero.  Then  the  optimum  release  time 

* 

T  is  given  by  the  following  expressions  for  the  cases 
when  y  >  1,  y  =  1,  and  0  <  y  <  1. 

Case  When  y  >  1 

Case  A:  If  a  (^— i-)  ^  D <  - - - ,  then  T  =0. 

-  3ye  c2  -  c^ 

Case  B.  If  a  (— — — )  U/y  >  - 3 -  then  there  exist  two 

-  Pye  c2  -  c^' 

feasible  solutions  T  =  T^  and  T  =  T2  (0  <  T^  <  T2  <  «°) 
which  are  the  two  positive  roots  of  Equation  (4.12)  and 
the  optimum  release  time  is  as  follows: 


If 

C  (0  ;  t ) 

>  C(T2;t)  , 

then 

T  =  nin{T2;t} 

If 

C  (0  ;  t ) 

<  C(T2;t) , 

then 

* 

T  =  0 

If 

C  (0 ;  t ) 

=  C  (T2;t) , 

then 

* 

T  =  0  and  T2 ,  t  >_  1 2 

* 

and  T  =  0  for  t  <  T2 


Case  When  y  =  1 
c 


If  a  < 


C2  ”  C1 


3  * 

,  then  T  =  0 . 


If  a  > 


C2  "  C1 


,  then 


there  exists  a  unique  solution 


Qt(C^  C  -.  ) 


To  -  T  In( - ^ 


of  Equation  (4.12)  and  the  optimum  release  time  is  T  = 


min (TQ ; t) 


Case  When  0  <  y  <  1 


For  this  case.  Equation  (4.12)  has  a  unique  positive 

root  which  is  the  solution,  and  the  optimum  release 

* 

time  is  T  =  min{T,;t}. 
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Al.  INTRODUCTION 


In  this  section  we  describe  the  development  of  sto¬ 
chastic  models  for  performance  and  cost  evaluation  of 
hardware-software  systems  in  the  operational  phase. 

Section  A. 2  deals  with  the  development  of  stochastic 
models  for  system  performance  assessment.  The  state  of 
the  system  is  described  by  up  or  down  states  of  the 
hardware  and  the  software  system  and  by  the  number  of 
errors  in  the  software  system.  The  hardware-software 
system  is  down  if  either  the  hardware  or  the  software 
system  is  down,  and  up  if  both  are  up.  The  hardware 
failure  distribution  is  exponential  with  failure  rate 
8.  The  software  failure  distribution  between  occurrences 
of  software  failures  is  also  exponential  with  a  failure 
rate  iX,  where  i  =  0,1,..., N  is  the  number  of  remaining 
errors  in  the  system.  The  repair  rates  are  exponential 
with  parameters  y  and  p ,  and  the  probabilities  of  imper¬ 
fect  repair  are  p^  and  ps  for  the  hardware  and  the  soft¬ 
ware  systems,  respectively. 

Based  on  this  model,  expressions  for  various 
stochastic  performance  measures  are  also  developed  in 
Section  A2.  These  are  distribution  of  time  to  a  speci¬ 
fied  number  of  remaining  software  errors;  state 


occupancy  probabilities;  expected  number  of  hardware, 
software,  and  hardware-software  failures  detected  by 
time  t;  system  reliability,  availability  and  average 
availability. 

In  some  cases,  an  improvement  in  one  performance 
measure  causes  a  worsening  of  another.  For  example,  an 
improved  system  availability  causes  an  increase  in  the 
expected  number  of  failures.  In  order  to  evaluate  the 
effect  of  these  conflicting  measures  on  system  per¬ 
formance,  cost  models  are  developed  in  Section  A3  for  the 
hardware,  software,  and  hardware-software  systems.  Each 
model  gives  expected  total  cost  by  time  t  and  consists 
of  three  cost  elements;  the  cost  of  failures,  the  cost 
of  repairs,  and  the  cost  due  to  system  unavailability. 

The  results  of  a  numerical  study  to  investigate  the 
effects  of  cost  factors,  failure  rates,  and  repair  rates 
on  the  expected  number  of  failures,  average  availability 
and  expected  total  cost/unit  time  are  also  discussed. 


A2 .  A  MARKOV  MODEL  FOR  HARDWARE-SOFTWARE 
SYSTEM  AND  PERFORMANCE  MEASURES 

In  this  section  we  develop  a  stochastic  model  and 
expressions  for  the  performance  measures  of  a  hardware- 
software  system.  The  basic  model  is  developed  in  Section 
A2 . 1  and  assumes  the  system  behavior  to  be  Markovian. 

In  order  to  use  this  model  to  evaluate  and  predict 
the  system  performance,  we  generally  need  expressions  for 
the  appropriate  quantitative  measures.  Such  expressions 
for  the  following  measures  are  derived  in  Sections  A2 . 2 
to  A2 . 5 . 

(i)  Distribution  of  time  to  a  specified  number  of 
remaining  errors  in  the  software  system. 

(ii)  State  occupancy  probabilities. 

(iii)  System  reliability  and  availability. 

(iv)  Expected  number  of  software,  hardware,  and 
total  failures  by  time  t. 

A2 . 1.  System  Description  and  Model  Development 

Consider  a  system  consisting  of  hardware  and  software 
components,  all  of  which  are  subject  to  random  failures.  The 
hardware  components  cail  due  to  either  defects  or  wear-out. 


A- 3 


A  software  component  is  said  to  fail  when  a  fault,  a 
specific  manifestation  of  an  error  in  the  program,  is 
evoked  by  some  input  data  resulting  in  the  program  not 
correctly  computing  the  required  function.  Whenever 
any  of  these  failures  occurs,  the  system  goes  out  of 
operation.  A  repair  activity  is  then  undertaken  to  re¬ 
move  the  cause  of  the  failure  and  bring  the  system  back 
to  an  operational  state. 

In  the  present  study,  we  assume  that  the  hardware 
and  software  components  can  be  viewed  as  a  single  system 
each.  In  other  words,  the  hardware-software  system  will 
be  treated  as  2-unit  (or  2-system)  systems,  one  repre¬ 
senting  the  hardware  components  and  the  other  the  soft¬ 
ware  components.  The  up  and  down  states  of  such  a  system 
are  shown  in  Figure  A2 . 1 . 

We  develop  a  model  for  the  stochastic  behavior  of  the 
system  under  the  following  assumptions: 

(i)  The  errors  in  the  software  system  are  independent 
of  each  other  and  each  has  an  error  occurrence 
rate  A . 

(ii)  The  failures  of  the  hardware  system  are  indepen¬ 
dent  of  each  other  and  have  a  constant  occurrence 
rate  £ .  Only  those  failures  which  cause  the 
system  to  go  dov.Tn  are  considered. 

The  probability  of  two  or  more  software  or 
hardware  failures  occurring  simultaneously 


( iii ) 


(iv) 


(V) 


(vi) 


(vii) 


( viii ) 


(ix) 


is  negligible. 

The  time  to  remove  a  software  error,  when 
there  are  i  errors  in  the  system,  follows  an 
exponential  distribution  with  parameter  y  ^ . 

The  time  to  remove  the  cause  of  a  hardware 
failure  follows  an  exponential  distribution 
with  parameter  y . 

Failures  and  repairs  of  the  hardware  system  are 
independent  of  both  the  failures  and  repairs  of 
the  software  system. 

At  most  one  software  error  is  removed  at  correc¬ 
tion  time  and  no  new  software  errors  are  intro¬ 
duced  during  the  error  removal  (correction) 
phase . 

When  the  system  is  inoperative  due  to  the  occur¬ 
rence  of  a  software  failure,  the  error  causing 
the  failure,  when  detected,  is  corrected  with 
probability  pg  (0  _<  pg  1),  while  with  proba¬ 
bility  qs(ps  +  qg  =  1)  the  error  is  not  removed. 
Thus,  qs  is  the  probability  of  imperfect  main¬ 
tenance  of  software. 

After  the  occurrence  of  a  hardware  failure,  the 
cause  of  the  failure  is  removed  with  orobability 
ph  (0  _<  p^  <_  1)  while  with  orobability  c^(p^  -*-0^=2.;, 
the  cause  is  net  removed.  Thus,  c^_  is 
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V  • 


the  probability  of  imperfect  maintenance  of 
hardware . 

(x)  The  system  is  considered  to  be  inoperative 
whenever  it  is  under  maintenance  following  a 
hardware  or  a  software  failure. 

Now,  we  examine  the  failure  and  repair  times  of  the 
software  and  hardware  systems  independently,  based  upon 
the  above  assumptions. 

Software  failures,  from  assumptions  (i)  and  (iii) , 
follow  an  exponential  distribution.  Let  i  be  the  number 
of  errors  in  the  software  system.  Then  the  probability 
density  function  (pdf)  of  the  time  to  next  software 
failure,  T^ ,  is  given  by  the  distribution  of  the  first 
order  statistic  of  i  exponential  distributions  each  with 
parameter  X,  i.e. 

fL(t)  =  (J)  (Xe_Xt)  <e'At)1_1 
or 

fi(t)  =  iX •e-lXt  (A2.1) 

Letting  X ^  =  ix,  the  pdf  and  the  cumulative  distribution 
function  (cdf)  of  T^  can  be  written  as 

-  X  .  t 

fi(t)  =  Xie  1  (A2.2) 

and  _  y  t 

Fi(t)  =  1  -  e  1  (A2.3) 

from  assumption  (iv),  the  cdf  of  the  software  maintenance 
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time  when  there  are  i  errors  in  the  system,  V7^ ,  is 

"Uit 

P(W.  <  t)  =  1  -  e  (A2.4) 

The  cdf's  of  the  hardware  time  to  failure,  U,  and  main¬ 
tenance  time,  V,  from  assumptions  (ii)  and  (v) ,  respec¬ 
tively,  are: 

P(U  <  t)  =  1  -  e'6t  (A2.5) 

and 

P(V  <  t)  =  1  -  e"Yt.  (A2.6) 

To  summarize,  hardware  failures  and  repairs  occur 
according  to  exponential  distributions  with  parameters  8 
and  y,  respectively.  These  parameters  are  considered  to 
remain  constant.  The  distribution  of  the  times  between 
software  failures  also  follows  an  exponential  distribution, 
but  its  parameter, also  varies  with  the  num.ber  of  errors 
remaining  in  the  software  system,  i.  The  distribution 
of  the  maintenance  time  for  software  is  again  exponential 
with  a  parameter  u i  which  changes  with  i. 

Now,  we  consider  the  failure  phenomenon  in  the  total 
hardware-software  system.  Suppose  there  are  i  errors  in 
the  software  and  the  total  system  is  operational.  Let 

Yi  =  min(Ti,U)  (A2.7) 

It  can  be  easily  shown  that  has  an  exponential  distri¬ 
bution  with  parameter  (B+A^)  and 

- ( B+x - )y 

Fy  (v)  =  1  -  e  1  (A2.8) 

1  i 

The  probability  that  a  software  failure  will  occur 
before  a  hardware  failure  is 
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P(Ti<U)  =  P  (U>Ti |Ti  =  t)  •dFi (t) 


f  -A  .  t 

=  P(U>t) • A^e  1  • dt 


- ( B+A . ) t 


“  V  . 

=  X  .e  1 

1 


P(Ti<U)  =  Pi  =  g-  + -  A  -  ,  i  =  0,  1,  ...,N 


(A2.9) 


Similarly,  the  probability  that  a  hardware  failure  occurs 
before  a  software  failure  is 


P(U<Ti)  5  qi  =  -g  i  =  0,1,  ...,N 


(A2.10) 


[n  other  words,  when  the  hardware-software  system  is 
operational  with  i  software  errors,  the  time  to  next  failure 
is  given  by  Y^.  The  probability  of  the  next  failure  being 
a  software  failure  is  p^  and  being  a  hardware  failure  is  q^. 


Let  X ( t )  denote  the  state  of  the  system  at  time  t; 


where 


1i,  the  system  is  operational  while  there 

are  i  errors  remaining  in  the  software 

system,  i  =  0,  1,  2,...,N. 

ig,  the  system  is  down  for  maintenance  of 

software  with  i  software  errors, 

i  =  1  ,2  ,  .  .  .  ,  N 
s  s  s  s 

i^,  the  system  is  down  for  maintenance  cf 
hardware  with  i  software  errors, 

xh  =  °hrlh'  ’  *  *  ,!'h 


(A2.ll) 


The  transitions  between  the  states  of  the  system,  i.e.. 


To  summarize  the  system  behavior,  consider  once  aqain 
the  above  situation,  i.e.,  the  total  system  is  operational 
with  i  software  errors.  The  time  to  a  failure  is  qoverned 
by  Y^.  If  a  software  failure  occurs,  and  the  probability  of 
this  occurring  is  p.,  the  system  goes  into  a  down  state  i  . 
The  system  undergoes  software  maintenance  and,  after  a  random 
time  governed  by  Equation  (A2.4),  goes  to  state  i  with  proba¬ 
bility  q  and  to  state  (i-1)  with  probability  p  . 

5  S 

If  the  failure  is  a  hardware  failure,  and  the  proba¬ 
bility  of  this  happening  is  q^,  the  system  goes  into  a  down 
state,  i^.  Following  a  repair  for  the  failure  according  to 
Equation  (A2.6),  the  system  goes  back  to  state  i  with  proba¬ 
bility  p^  or  stays  in  state  i^  with  probability  q^ . 

The  above  system  behavior  is  valid  only  until  the  soft¬ 
ware  is  error-free.  After  the  software  is  error-free,  the 
total  system  reduces  to  a  hardware  system  only. 

Thus,  we  see  that  the  stochastic  process  X(t)  forms  a 

semi-Markov  process.  It  makes  transitions  as  described 
above  and  the  times  spent  in  various  states  are  random, 
given  by  Y^,  Wk  ,  or  V,  depending  on  the  state.  A  typical 
realization  of  the  X(t)  process  corresponding  to  Figure 
A2 . 2  is  shown  in  Figure  A2 . 3 . 

Let  Q,  .(t),  k,j  =  i,i  , i  ,  be  the  one  step  transi- 
k  t  j  s  n 

tion  probability  that  after  making  a  transition  into  state k, 
the  process  X(t)  next  makes  a  transition  into  state  j  in 
an  amount  of  time  less  than  or  equal  to  t.  Then,  Q,_  .  (t^ 


is  given  by  the  product  of  P,  .  and  the  cdf  upto  t  of  the 

k  *  3 

time  corresponding  to  state  k.  Thus,  for  k  =  i,  and  j  = ig, 
we  have 


The  expressions  for  0,  .(t)'s  given  by  Equation  (A2.14) 

k »  3 

constitute  the  basic  equations  that  describe  the  stochas¬ 
tic  behavior  of  the  X(t)  process.  These  equations  will 
be  used  in  the  subsequent  sections  to  derive  the  system 
performance  measures.  We  will  need  the  Laplace-Stielt jes 

transforms  of  the  Q,  .(t)'s  and  some  related  results. 

k  t  3 

These  are  given  below. 
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Let  £  and  /S  denote  the  Laplace  and  Laplace  Stieltjes  trans¬ 
form,  respectively;  and  for  any  function  g  and  G,  let 


g*  (s)  =  /(g(t)),  and  G(s)  =  IS  (G  (t)  )  . 

The  Laplace-Stielt jes  transform  of  the  above  Q.  . (t)'s 

1  /  3 

are 


>es(o.  .  )  = 
1#1s 

1 

(A2. 16) 

s  +  8  +  \  .  ' 

l 

^S(Q.  )  = 

i,lh 

6 

(A2. 17) 

s  +  B  +  \ .  ’ 
i 

^S(Q  )  = 

s' 

q  U  . 

S  1 

(A2 . 18 ) 

s  +  p  .  ' 

l 

Xs,Qis,i-i» 

psui 

S  +  p  .  ' 

1 

(A2.19) 

^s(Qih,i)  = 

phY 

(A2.20) 

S  +  Y ' 

^fs(o  >  = 

qhY 

(A2 .21 ) 

S  +  Y  ' 

The  following  Lemmas  from  the  basic  Laplace, 
Laplace-Stielt jes  transforms  and  their  inverses  will  be 
useful  for  our  analysis  (see  Abramowitz  et  al.,  1965, 
and  Muth,  1977) . 

Lemma  A2 . 1 .  (Linearity  property).  if 


then 


h  (t)  =  Af  (t)  +  Bg  { t ) 


h*(s)  =  ^  (h(t))  =  AMs)  +  Bc‘(s). 
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Lemma  A2 . 2 .  The  Laplace  transform  of  pdf  f (t)  is  equivalent 
to  the  Laplace-Stielt j es  transform  of  its  cdf  F (t) . 


Us)  =£{f  (t)  }  =  e-st 


■f  (t)  dt 


=  f  e  stdF(t) 
J 
0 

=  S{F  (t)  }. 


Lemma  A2 . 3 .  (Heaviside  Expansion  Theorem).  If 

(i)  q(s)  =  (s-a  )  (s-a.)  ...  (s-a  ), 
i  ^  m 


where 


ai  ¥  a  ^  ...  ^  a  , 


(ii)  p(s)  is  a  polynomial  of  degree  m,  and 

(iii)  f*(s)  =  §4§T' 


then 


S  P(an}  anfc 

f  (t )  =  /  — r-i e  i 

cr'  (a  ) 
n=l  ■  n 


where  o'  (a.)  =  7  (a.  -a.) 

i,j=l  1  D  ' 


r.  P  (a  )  ,  a  t 

F(t)  -  l  mrr  '  r-  (e  n  *  u. 

n=l  *  n  n 


If  one  of  the  a  ,  i.e.  a.  =  0,  1  "  i  <  m,  then 

n  i  —  — 

P  (a  )  r  p  (a  )  at 

F,t)  "  ^7177  *  +  s  r-  <e  "  -  1( 

i  n=l  '  n  n 
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A2 . 2  Distribution  of  Time  to  a  Specified  Number  of 


Remaining  Errors  in  a  Software  System 


The  errors  remaining  in  the  software  system  are 
sources  of  failures  and  we  would  like  to  remove  them 
as  soon  as  possible.  However,  it  is  not  always  feasible 
and/or  practical  to  remove  all  of  them  in  a  reasonable 
time.  In  that  case,  we  would  like  to  know  the  distri¬ 
bution  of  time  to  n  (0  n  £  N)  remaining  errors. 

Let  ^  be  the  first  passage  time  for  operational 
state  i  to  operational  state  n  and  let  G^  n  be  its  cdf. 
Now  we  derive  the  equations  for  gN,n(t)  and  GN,n(t)'  the 
pdf  and  cdf,  respectively,  of  R. 


A2 .2.1  Distribution  of  TfJ  n 

Consider  a  time  interval  (r,r+dr).  For  any  i,  the 

probability  of  going  from  i  to  i  in  this  interval  is 

dQ.  .  (r)  and  the  probability  of  going  from  i  to  i^  is 

1,;Ls 

dC-  .  (r) .  Once  the  X(t)  process  reaches  either  ig  or  l^, 

1 ' 1h 

further  transitions  in  it  will  be  governed  by  cdf 's,  n 

s ' 

and  G.  ,  respectively.  Thus,  the  renewal  equation  for 
xh'n 

G.  ,  i  =  n+1,  . . . ,N  can  be  written  as 

i ,  n 
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Km 


G.  n(t) 
1  ,n 


"  J  f  Gk  n 

..  IfrR  K'n 


(t  -  x)dQ^  k(x) 


=  Q.  .  *G.  (t)  +  Q.  .  *G.  (t> 

1,1.  i,  ,n  1,1  i  ,n 

n  n  s  s 


+  Q.  .  *0-  .  *G.  ,  (t) , 

i,i_  i  ,i-l  i-l, n'  ' 

5  S 


(A2.22) 


where  E  is  the  state  space,  G  =  1,  O  =  IQ?  .  ,  and 

n , n  H  j  V1*! 

O'!  .  is  the  j-fold  convolution  of  Q.  .  with  itself. 

1h,1h  1h ' 1h 

Taking  the  Laplace-Stielt jes  (L-S)  transform  of  Equa¬ 
tion  (a2 . 22 )  we  get 

G.  (s)  =  Q.  .  (s)Q„(s)Q.  . (S)G.  (s) 

x  /  n  i  /  n  /  x  x  /  n 

+  Q-  ,  (s)Q.  . ( s) G .  ( s ) 

i,ig  ig»i  1,n 


s  s 


(A2.23) 


where 


oo 

Vs)  =  l  0-1  i  ( 

H  j=0  1h' 1h 


s)  =  -^X_ 
s  +  phy 


and,  from  Equations  (*2.16)  to  (A2.21), 

^i,i,  (s)  =  s  +  (3  »-  A .  ' 

h  l 


O.  .  (s)  =  — x —  , 

vy,  s  +  Y 

phY 

Qih,i(s)  "  ni  ■ 
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X . 

5i,i  <s)  *  s  *  t  i  >.  - 

S  1 

qsui 

Qv*(s) 

°i  i-i(s)  ■  r;;  • 
s'  "  Mi 


On  substituting  the  expressions  for  the  various  L-S  trans¬ 
forms  in  Equation  (A2.23)  and  simplifying,  we  get 


G  (s)  «  a . G .  (s)  +  b.G.  .  (s), 

i  f  n  x  l  f  n  l  i  l  f  n 


where 


phy3(s+yi)  +  qs^±Ui (s+PhY) 
ai  _  (s+phy)  (s+8+X^)  <s+y^5 


b.  = 


p  X  .  vj  . 
*S  1*1 


(s+y^) (s+B+X^) 


(A2.24) 

(A2.25) 

(A2.26) 


For  i  =  n+1,  G.  ,  (s)  =  G  (s)  =  1,  and 

l-l, n  n,n 

bi 

G.  (s)  =  G  ,  (s)  =  y  ■■  - 

i,n  n+l,n  1  -  a^ 


PsVi(S+PhY) 

(s+xl,i>  (s+x2  i)  (s+x3ii)  ' 


(A2.27) 


(A2.28) 


where  -x1  .  ,  -x..  .  ,  and  -x_  .  are  the  roots  of  the  polynomial 

1  f  1  ^  f  1  O  /  l 

s3  +  s2(Xi+yi+8+PhY)  +  s(psX.vji  +  Bui  +  XiphY  +  uiPhY) +PsPhY^iVJ. 


-  t  N  PsAiUi((s+phY)} 

GN'n<S’  '  i=n+l  (s+3tl,i>  ls+x2,i' (s+x3,i> 


(A2.29) 


Further,  let 
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The  distribution  function  of  the  first  passage 
time  to  enter  a  state  corresponding  to  a  specified  number 
of  remaining  software  errors  will  be  useful  in  the  study 
and  analysis  of  the  other  performance  measures. 


A2 . 2 . 2  Mean  and  Variance  of  T 


N,n 


Now, 


e|tn,„1  “ 


tgN,n(t)dt 


Substituting  for  gN>n(t)  from  Equation  (A2.32),  we  get 


,N 


5  Un+1  (_x-i  +  PkY) 

V 

3  =  1 


N-n  oo 


E[TM  ]  =  T  3  h 

N ,  n  .  c 


-x  .t 
te 


dt 


n  (— x .  +  x. ) 

i=l  1  1 


or 


EITn,„1 


K  UE  (-x.  +  p 
_  r  n+1  3  ^ 

K 


,y) 


N-n 


j=l 


n  (-x .  +  x. ) 

i  =  l  1  x 

i^j 
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<  .  > 


(A2.34) 


(A2.35) 


Similarly,  to  get  the  variance  of  T._  we  have 

N ,  n 

v  „N  ,  ,  .  N-n 

e[t2  ,  .  f  n-l  (-Xi  *  *V>  . 

N,nJ  .£  K  ,  >  3 

3  1  n  (-x.  +  x.)  (  xj} 

i=l  1  1 


VartTN,nI  -  E'TN,n>  -  e2'tn,„'- 


(A2.36) 


(A2.37) 


^2 .2.2  Illustrative  Example 

Consider  a  system  with  N  =  10  errors,  pg  =  0.9,  and 

-  0.9.  Assume  that  X^  =  iA,  y^  =  iy,  and  the  parametric 

values  are  X  =  .02,  y  =  .05,  8  =  .01,  and  y  =  .025.  We 

are  interested  in  the  distribution  of  T„  ,  n  =  0 , 1 , 2 , . . . , 8 , 9 . 

N ,  n 

The  pdf's  and  cdf's  of  Tx,  for  various  values  of  n  and  for 

n  ,  n 

t  from  0  to  500  units  are  computed  from  Equation  (A2.32)  and 
(A2.33),  respectively,  and  are  shown  in  Figures  A2 . 4  and  A2 . 5 , 
respectively.  Also,  the  means  and  variances  of  these  dis¬ 
tributions  are  obtained  from  Equations  (A2.35)  and  (A2.37) 
respectively,  and  are  summarized  in  Table  A2 . 1 .  From 
Figures  (A2.4)  and  (A2.5),we  notice  that  the  distributions 
are  highly  dependent  on  n.  Also,  as  expected,  the  distri¬ 
bution  of  the  time  to  an  error-free  software  system  has 
a  large  mean  and  a  large  variance  (see  Table  A2.1).  The 

mean  and  variance  of  T. n  for  p.  =  1.0  are  also  given  in 

10 ,  n  ^h 

Table  A2.1.  We  note  that  both  of  these  values  are  smaller 
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X^  =  iX 

X  =  .02 

6  =  .01 

Ps 

« 

ui  ■  iy 

y  =  .05 

Y  =  .025 

Ph 

N  =  10, 

n  =  9,8,. ..,1,0 

Figure  A2.4.  Probability  Distribution  Function  of 
First  Passage  Time  to  n. 
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tx  10-1)  CUM.  PROBABILITY 

_p.  00  2.00  14.00  6.00  8.00  10.00 

w  l  I  »  _  I  l  I 


PH=.  9 


Figure  A2.5.  Cumulative  Distribution  Function  oi 
First  Passage  Time  to  n. 
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MEAN.  VARIANCE,  AND  STANDAR 


than  those  for  p^  =  0.9  because  of  an  improvement  in  the 
hardware  system  maintenance  activity. 


A2 . 3  State  Occupancy  Probabilities 

In  this  Section  we  are  interested  in  deriving  expres¬ 
sions  for  the  probability  that  the  system  is  operational 
at  time  t  with  a  specified  number  of  remaining  software 
errors.  Let  PN  n(t)  be  the  probability  tnat  the  system 
is  operational  at  time  t  with  n  remaining  software  errors, 
given  that  it  was  in  operation  at  time  t  =  0  with  N  soft¬ 
ware  errors,  i.e. 

PN,n(t)  =  P{X(t)=n|x(0)=N},  n  =  0 , 1 , . . . ,N  (A2.38) 

We  call  PN  n(t)  the  (operational)  state  occupancy 
probability.  By  conditioning  on  the  first  up-down  cycle 
of  the  process  and  using  an  approach  similar  to  that  of 
Section  A2. 2  we  get  the  following  renewal  equation  for 


P  (t) 
n,n 


-(X  +B)t 

Pn  (t)  =  e 
n ,  n 


+  Q_  *P  „(t) 
n  , n  n,n 


(A2.39) 


By  conditioning  on  the  first  passage  time,  we  get 


PM  (t)  =  Pn  *Gm  _(t) . 
N  f n  n,n  N  ^  n 


(A2.40) 


To  obtain  the  L-S  transform  of  P„  (t) ,  we  take  the 

N ,  n 

L-S  transforms  of  Equations  (A2.39)  and  (A2.40)  and  solve 

the  resulting  equations.  Let  a.'s,  b.'s,  and  x.  . ' s  be  as 

l  l  1 1 3 

given  in  Equations  (A2.25),  (A2.26),  and  (A2.30),  respectively 


Then,  by  having  GN  N(s)  =  1,  and  letting 

A  (s)  =  s0(s+yn)  +  *n(s+PsV  (s+phY) '  and 

B(s)  =  (s->‘i,n)<s-x2,n)(s-x3,n)' 
the  L-S  transforms  of  PN  n<t)  and  respectively,  are 


Vn(s>  *  (1  '  Hi)1  Vn(s) 


(A2.41) 


PN,0(S)  =  (1  • 


S  +  0  +  PhY} GN,0(s) 


The  expressions  for  PN,o(fc)  and  pN  n(t),  n  =  1»***»N 
are  obtained  from  the  results  of  Lemma  A2.1  to  A2.3  as 


PN,n(t)=GN,n(t) 


K  Cl<~Xi^hY)  A(-X3} 
■  i,  K 

1-1  n  (-x.  +  x.) 

i=l  3 

i^j 


-x  .  t 
(1  -  e  3  ) 


PN,0(t)  GN,0(t) 


K1  0U*  (-x.  +PhY)N 

V  1  3 - 1“ - 

■-1  K, 

3  1  II  +  xi) 

i=l  3 

i/j 


-x  .t 
(1  -  e _ 3) 


where  K  =  3(N-n+l)  and  K1  =  3N+1  are  the  number  of  roots 
in  the  denominator. 

A2.4  System  Reliability  and  Availability 

A2.4.1  System  Reliability 

The  reliability  of  a  system  at  time  x  is  given  by 
F(x)  =  1  -  F (x) 

where  F  is  the  life  distribution  of  the  system.  The  corre¬ 
sponding  conditional  reliability  of  a  unit  of  age  t  is 
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F  (x  1 1)  =  -X—  ,  if  F(t)  >  0 

F(t) 

Consider  our  hardware -software  system.  At  t = 0,  the 

initial  number  of  software  errors  in  the  system  is  equal  to 

N.  The  reliability  of  the  system  at  this  stage  is 

P  {up  time  >  x}  =  P  {min(U,TN)  >  x} 

=  P  (U  >  x) *P  {T  >  x} 

-<**+XN)x 

=  e 

Next,  consider  some  time  t  >  0  when  the  system  has  just  been 
repaired  and  there  are  i  remaining  errors.  The  reliability  of 
the  system  is 

-(B+Xi)x 

P  {up  time>x|x(t)  =  i}  =  P  {min(U,T^)  >x}  =e  (A2.42) 

A2.4.2  System  Availability 

Another  useful  measure  of  system  performance  is  its 
availability,  which  is  defined  as  the  probability  that  it  is 
operational  at  some  given  time  t.  In  our  case,  the  system  will 
be  operational  if  the  hardware  system  is  in  an  up  state  and  the 
software  system  is  in  an  up  state  with  n  remaining  errors, 
n  =  0, 1, . . . ,N.  In  Section  A2 . 3,  we  derive  the  expressions  for 
(t) ,  the  probability  that  the  system  is  operational  at 

N ,  n 

time  t  with  n  errors  in  the  software  system,  given  that  it 

was  operational  at  t  = 0  with  N  software  errors.  Thus,  the 

system  availability  can  be  defined  as 

N 

A ( t )  =  l  PN  (t)  (A2.43) 

n=0  N'n 

To  see  the  behavior  of  A(t)  we  consider  an  example 
with  N  =  10,  ps  =  0.9,  ph  =  0.9,  X  =  .02, y  =  .05,  B  =  .01  and 


A- 


A  = 

.02 

e  =  .01  ps  = 

.9 

9 

y  = 

.05 

y  =  .025  Ph  = 

.9 

N  = 

10, 

n  »=  10,9, . . . ,  1 , 0 

Figure  A2. 6.  State  Occupancy  Probabilities 
9(  and  System  Availability. 
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y*.025.  For  these  values,  distributions  PN  n(t),  n  = 0, 1 . . . . , 10, 
t  from  0  to  500  are  obtained  as  described  in  Section  A2.3.  Se¬ 
lected  values  of  P„  (t)  are  given  in  Table  A2 . 2  and  the  proba- 

N ,  n 

bility  distributions  are  plotted  in  Figure  A2 . 6  for  n=0  #  •  •  • » 10. 

The  availability,  as  given  by  Equation  (A2.43)  is  obtained  as  the 
sum  of  probabilities.  Thus,  for  t  =  100,  we  have 


Similarly 


A  (100)  =  l  P.rt  (100)  =  0.5612 
lu ,  n 

n=0 

10 

A  (500)  =  l  P.q  (500)  =  0.6819 
n=0  ' 


Values  of  A(t)  for  various  t  are  also  plotted  in  Figure  A2 . 6 
A2.4.3  Average  Availability 


A  sampling  measure  for  the  availability  of  an  operational 
system  is  the  ratio  of  total  up  time  to  total  time  elapsed. 

From  a  practical  point  of  view,  it  is  an  important  measurable 
sampling  characteristic. 

From  the  definition  of  availability,  we  find  that  the 

expected  value  of  total  up-time  by  time  t  caa  be  expressed  as 

t 

U  (t )  =  f  A(x)dx. 


The  ratio  of  this  value  to  the  total  time  elapsed,  t,  will 
give  us  an  average  availability  up  to  time  t,  A  (t) ,  i.e. 


A  (x)  dx 


A  (t) 
av 


=  U(t)  _  0 


Similarly,  the  average  unavailability  can  be  ex.  r 


•t  ft 

A  (x) dx  { 1  - 


A  (x)  :> 


1  “  Aav(t)  =  1" 
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A2 . 5  Expected  Number  of  Software,  Hardware 
and  Total  Failures  by  Time  t 


A2 . 5 . 1  Expected  Number  of  Software  Failures 

Let  Mg(t)  be  the  expected  number  of  software  failures 
detected  by  time  t.  In  order  to  find  the  expression  for 
M  (t)  ,  we  consider  a  counting  process  {N  .(t),t  >  0), 
where  N  . (t)  is  the  number  of  software  failures  detected 

S  A. 

during  the  time  interval  (0,t],  when  the  initial  number  of 
errors  in  the  software  system  is  i.  Let 
Mgi(t)  =  E[Nsi(t)  |  X  (0)  =  i)  . 

Then,  by  conditioning  on  the  first  passage  time  going 
from  state  N  to  i. 


V«  * 


(A2.45) 


where  Mgi(t)  can  be  obtained  by  conditioning  on  the  first 


down  cycle  of  the  process 

Msi't>  -  Qi,is(t)  +  Qi,is*QV*M=i!t) 


+  Qi<ih*QH*QVi*MSi(t) 


The  Laplace  Stieltjes  transform  of  Mg^(t)  is 


*i 


Msi(s)  s  +  8  +  X . 


+  a . M  .  (s) 
1  si 


where  a^  is  defined  in  Equation  (2.25) .  Now, 

~  *i 

Msi(s)  =  s  +  8  +  X.  1  -  a. 


or 


M  . (s)  = 


si 


l  “l 

Xi(s+ui) (s+phY) 

18+Xlii)(«+X2ii)(8+X3ji) 
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where  xi  j/  x2  i»  x3  i  are  9*ven  in  Equation  (2.28) 
In  a  simplified  form. 


(s  +  y.)  - 


M.i(8)  -  p  u  -  °i,i-l(s) 

S  X 

From  (A2.45)  and  (A2.46)  the  L-S  transform  for  M  (t)  is 


(A2.46) 


Vs>  * 


N  (s  +  y.)  „ 

"  Jl  P s'1!  G".i-l<S) 


(A2.47) 


Finally,  using  Lemmas  A2.1  to  A2.3,  we  obtain  the  expression 


for  Mg(t)  as 


Ms(t)  =  l  l 
S  i=l  j=l 


where  =  3(N-i+l). 


N  Ki  U«(-x1+phY)H-i+1(-x1  +  lli. 


psui  <-*j  +  V 


A2.5.2  Expected  Number  of  Hardware  Failures 


Let  M^ft)  be  the  expected  number  of  hardware  failures 
detected  by  time  t.  Consider  a  counting  process,  {N^(t), 
t  _>  0},  where  Nh^(t)  is  the  number  of  hardware  failures 
detected  during  the  time  interval  (0,t],  when  the  initial 
number  of  errors  in  the  software  system  is  i.  Let 

Mhi (t)  -  E[Nhi(t) |X(0)  -  i] 

Then,  by  conditioning  on  the  first  passage  time  going 
from  state  N  to  i. 


“h(t)  *  if0Mhi*GN,i(t) 


where  (t)  can  be  obtained  by  conditioning  on  the  first 
down  cycle  of  the  process 


Mhi<t)  -  Qlfi.h<t>  +  Qi,ih  Qih,i  "hi'4* 

+  Ql,i  *Qi.,i*Mhi<t' 

S  5 

Now,  the  L-S  transform  of  M^(t)  for  i  *  1»2,...,N  is 

«hi<s>  -  s4*T7  +  “Ai'8’ 

or  -  6(s+u  )  _ 

For  i  =  0,  this  L-S  transform  becomes 

Mho(s)  *  s  +  8  r~-“a"; 


°r  (s+phY) 

_  sU+8+Phy)  * 

From  (A2.49)the  L-S  transform  for  M^tt)  is 

»h(8>  =  Jo  ”hi<S)6«,i(8>  <A2 

The  inverse  L-S  transform  of  ^(s)  is 

Ki  6Di+l(_x1  +Ph'',N"1+1('x1+'Ji)  (1 

G.  (t)  -  ll  A+i  ] - h - 2 - L.  •  - > 

1  -is]  *4  X-i 

3  A  n1  (— x  •  +  xp)  3 

£=1  3  * 

W 

where  *  3(N-i+l),  and  the  inverse  L-S  transform  of 
M».«(s)G„  rt(s)  is 


(A2.50) 


(A2.51) 


<30(t) 


BUl(phY>t  ^  eui<xj+PhY) 

“TFI - +  A  “K^L  - 

VI  i _ \  J  *  TT  /  ..  _l_ 


N+l 


n 

j-i 

Xj/O 


(Xj) 


Xjj*0 


n 

jt-i 


<'xj  ♦  Xt) 


-X.t 

(1-e  3  ) 

*3 


where  K  *  3N  +  2. 

Finally,  the  expression  for  M^ft)  is 

N 

Mj^t)  =  G0(t)  +  I  G^t)  • 


A2.5.3  Expected  Number  of  Total  Failures 

Let  M (t)  be  the  expected  number  of  total  failures 
detected  by  time  t. 

Consider  Mi<t)  to  be  the  expected  number  of  total  failures 
when  there  are  i  software  errors  in  the  system.  For  any 


i  ■  0, 

1  ^  2  /  • «  «  f  N« 

Mi(t)  *  Msi(t)  +  Mhi(t) 

(A2.52) 

where 

Ms0(t)  *  °' 

and 

^(S)  «  (S)  + 

Then 

N 

M (t)  »  Z  M.*Gm  .  (t)  , 
i«=0  1  N'X 

(A2.53) 

ft(s)  -  z  ft.(s)SN  .(S) 
i-0  1  N,X 

or 

N  .  . 

k(8)  «  Z  (M. (s)  +  SL . (S))GM  .(S) 
i»0  81  "x  Nfl 

and 

M{t)  -  Mg(t)  +  M^t) 

(A2.54) 
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TIME 


SOFTWARE 

FAILURES 


HARDWARE 

FAILURES 


TOTAL 

FAILURES 


0.00 

0.00 

0.00 

0.00 

20.00 

2.48 

0.14 

2.62 

40.00 

4.19 

0.26 

4.45 

60.00 

5.47 

0.38 

5.85 

80.00 

6.47 

0.49 

6.96 

100.00 

7.27 

0.61 

7.87 

120.00 

7.92 

0.72 

8.64 

160.00 

8.91 

0.94 

9.85 

200.00 

9.59 

1.17 

10.77 

240.00 

10.06 

1.41 

11.48 

280.00 

10.39 

1  .66 

12.05 

320.00 

10.62 

1.92 

12.53 

360 . 00 

10.77 

2.18 

12.95 

400.00 

10.88 

2.44 

13.32 

440.00 

10.95 

2.71 

13.66 

480.00 

11.00 

2.98 

13.99 

520.00 

11.04 

3.25 

14.29 

560.00 

11.06 

3.53 

14.59 

600.00 

11.08 

3.80 

14.88 

640.00 

11.09 

4.08 

15.17 

680.00 

11.10 

4.35 

15.45 

720.00 

11.10 

4.63 

15.73 

760.00 

11.10 

4.91 

16.01 

800.00 

5.18 

16.29 

840.00 

5.46 

16.57 

880.00 

■  . 

5.74 

16.85 

920.00 

6.01 

17.12 

960.00 

11.11 

17.40 

1000.00 

11.11 

6.57 

17.68 

Consider  a  system  with  an  initial  number  of  software 
errors,  N  *  10;  probabilities  of  perfect  software  and  hard¬ 
ware  maintenance  pg =  .9  and  *  .9,  respectively;  software 
failure  rate  X^^  =  iX,  and  X  «  .02;  software  repair  rate 
P^  =  ip,  and  p  *  .05;  and  hardware  failure  and  repair 
rates  8  *  .01  and  y  =  .025,  respectively. 

For  this  system,  the  expected  number  of  software, 
hardware,  and  total  system  failures  are  computed  from 
Equations  (A2.45),  (A2.49),  and  (A2.54),  respectively. 
Selected  values  of  these  quantities  are  given  in  Table  A2.3 
and  plotted  in  Figure  A2.7.  The  information  in  Table  A2.3 
shows  us  that  the  number  of  software  failures  detected  is 
increasing  rapidly  at  the  early  times  and  then  slows  down 
as  the  number  of  remaining  software  errors  and  software 
failure  rate  decrease. 

On  the  other  hand,  the  number  of  hardware  failures 
detected  is  increasing  with  the  slow-down  of  the  number 
of  software  failures  detected. 


-3 


In  Section  2  we  proposed  a  model  for  the  operational 
phase  of  the  hardware-software  system  and  developed  ex¬ 
pressions  for  several  performance  measures.  In  many  appli¬ 
cations,  these  individual  measures  are  of  less  interest 
than  an  overall  measure,  such  as  the  expected  total  cost. 
With  this  objective,  in  this  section  we  develop  cost 
models  for  the  hardware,  software,  and  hardware-software 
systems.  The  principal  cost  components  considered  are 
the  cost  of  a  failure,  the  cost  of  the  maintenance  activity 
performed  to  bring  the  system  back  to  an  operational  state, 
and  the  cost  of  system  downtime.  The  primary  measures 
that  affect  the  total  cost  are  the  number  of  failures  and 
the  system  availability. 

The  relative  importance  of  these  measures  in  a  given 
situation  can  be  expressed  via  the  numerical  values  for 
the  cost  factors. 

Models  for  hardware  and  software  systems  are  developed 
in  Sections  A3. 1  and  A3. 2,  respectively.  The  total  hardware- 
software  system  is  discussed  in  Section  A3. 3.  Several 
numerical  examples  are  used  to  illustrate  the  results. 


In  this  Section  we  develop  a  cost  model  for  the  hard¬ 
ware  system.  The  system  is  in  an  up  state  at  time  t  =  0. 
After  a  random  time  U,  whose  distribution  is  exponential 
with  parameter  8  (see  Equation  A2.5V  a  failure  occurs  and 
the  system  goes  into  a  down  state.  A  repair  or  mainten¬ 
ance  activity  is  undertaken  and  after  a  random  time  V, 
whose  distribution  is  exponential  with  parameter  y  (see 
Equation  A2  i  6 ) ,  the  system  is  brought  back  into  an  up  state 
The  cause  of  the  failure  would  have  been  removed  with 
probability  p^  (0  <  <  D •  The  sequence  of  up  and  down 

states  forms  a  renewal  process.  For  purposes  of  this 
Section,  it  is  assumed  that  the  software  system  has  no 
effect  on  the  operation  of  the  hardware  system. 

The  following  costs  are  incurred  due  to  the  failure 
and  maintenance  activities: 

(i)  A  fixed  cost  c.  per  failure 

nl 

(ii)  A  variable  cost  c.  per  repair  per  unit  time 

n2 

(iii)  A  variable  cost  due  to  the  unavailability  of 

the  system,  c.  per  unit  time. 
n3 

Consider  the  time  interval  (0,t). 

Let 

C^(t)  *  expected  total  cost  incurred  by  t, 

M^(t)  =  expected  number  of  hardware  failures  by  t, 


Aj^t)  =  system  availability  at  t. 

Then,  the  expected  total  cost  by  time  t  is  given  by 


ch(t) 


:h1Mh(t)  +  ch2n  +  ch3 


J{l-Ah(x)}dx  (A3.1) 


where 


| { 1-A^  (x) }dx  is  the  expected  total  down  time  during 
0 

(0,t).  Now  we  develop  expressions  for  M^(t)  and  Ah(t) 
and  obtain  a  closed  form  equation  for  Ch(t).  Consider 
one  up  and  down  cycle,  i.e.,  one  renewal.  If  the  main¬ 
tenance  activity  is  perfect,  the  length  of  this  cycle 
will  be  U  +  V.  If,  however,  the  maintenance  activity  is 
imperfect,  the  repair  will  go  another  V  units  of  time 
so  that  the  length  of  the  cycle  will  beU+V+V.  If 
the  maintenance  is  imperfect  for  the  second  time,  the 
length  of  the  cycle  will  be  0  +  V  +  V  +  V,  and  so  on. 
Therefore,  the  probability  density  function,  g,  of  the 
renewal  time  is  given  by 


9  *  phfU*fV  +  phqhfU*fV*fV 


f*f *f  *f„  +  ..., 


+  Ph*3h  U  V  v  V 


where  *  stands  for  convolution. 


(A3. 2! 


ft.  is  the  pdf  of  U  , 


and  -  fv  is  the  pdf  of  V. 

The  Laplace  transform  of  g,  g*,  is 

g*(s)  =  Phf* (s) f* (s) [1  +  qhfj(s)  +  (qhf*(s))2  + 


or  g*  (s)  =  phfj(s)f*(s)  ( 


1  -  qhfj(s) 


or  q*(s)  -  ph  5-TT  FT7  iVp?i 


Now,  the  renewal  equations  for  the  expected  number  of  hard 
ware  failures  can  be  written  as 


M^t)  -  FyU)  +  |  M^t  -  x)  g  (x)  dx 


where  Fy(t)  is  the  cdf  of  U. 


The  Laplace  transform  of  M^(t)  is 
f  *  ( s ) 

Mh(s)  =  s —  + 


sTs+T! 


*  R 

+  Vs)  rr 


s  +  6  s  +  phY 


Mj(S)  - 


6 (s  +  phY) 

~ 5 - 

s  (s  +  6  +  phy) 


By  taking  the  inverse  Laplace  transform  we  get 


Mj^t)  =  6[ 


1  -  e 


-  (8+PhY) t 


]  + 


6  +  P„Y 

-{6+PhY)t 


8phY I  - 


-1  +  (B+PhY)t 


(6  +  phy) 


or 


R  -  (  B+pkY ) t 

Mh(t)  -  7777  72  IPhY<e+PhY)  t  +  8d-e  )] 


(6+PhY) 


The  renewal  equation  for  A^  (t)  can  be  written  as 

t 

Aj^t)  =  1  -  Fy(t)  +  |  Ah(t  -  x)g(x)dx. 


and  its  Laplace  transform  as 
1  "  f*(s) 


a*(b)  =  ~u _  s  +  PhY 

h  1  s  [1  -  g*  (s)  ]  s  (s  +  0  +  p.  y) 


Therefore,  the  availability  of  the  system  at  time  t  is 


Ah  (t)  =  X  1  (s) 


=  e-(6+Y>t  +  ,1_= 


~(6+PhY)t 


8  +  pkY 


or 


Ah(t)  = 


p.Y  +  Be 


- (B+PhY) t 


B  +  pKY 


(A3. 5) 


lA3 .6) 


Now,  the  expected  total  down  time  during  (0,t)  is 


J  {1  -  Ajj (x)  }dx  which,  on  substituting  for  A^(x)  from 


Equation  (A3. 6),  gives 
t 


(1  -  Ah (x) ) dx  =  8 [ 


(B+PhY)t  -  1  +  e 


-(B+PhY) t 


-1 


(A3. 7) 


(8  +  PhY) 

t 

On  substituting  the  expressions  for  M^(t)  and  Jd-A^fx)  }dx 
from  Equations  (A3. 5)  and  (A3. 7),  respectively,  in  Equation 
(3.1),  we  get,  after  some  simplification, 

- (6+PhY) t 


ch 6 


Ch(t)  =  lPhV(8+PhY)t  +  8(l-e 


(B+PhY) 


)  ]  +  c.  Yt 
n2 


Cho6  -(B+PhY)t 

+  - - — j  t(6+PhY)t  -  1  +  e  n  ]  (A3. 8) 


(8+PhY) 


The  above  equation  gives  the  expected  cost  incurred  by 
time  t  in  terms  of  the  hardware  system  parameters  8,  y,  and 
p,  .  and  the  cost  factors  c,  .  c,  ,  and  c,  . 


Illustrative  Examples 

We  numerically  study  the  behavior  of  M^(t),  Ajlav(t) 

and  C.  (t)/t  as  a  function  of  the  cost  factors  c.  ,  c.  , 
h  hx  h2 

c.  and  of  the  failure  and  repair  rates  8  and  y,  respec- 
n3 

tively . 

Consider  a  system  with  6  =  .01,  p^  =  0.9,  and  y  =  .01 
.02,  .05,  .10,  .20,  .30,  .50,  .75,  1.0,  2.0,  3.0,  and  4.0. 
The  average  availability  (;^av(t))  and  the  expected  number 
of  failures  by  time  t  are  shown  in  Table  A3.1  for  t  =  100, 
250,  500,  1000,  and  2000.  We  notice  that  for  a  fixed 


repair  rate  the  average  availability  decreases  with  time, 
the  rate  of  decrease  being  higher  for  low  values  of  y. 

The  expected  number  of  failures  in  a  given  time  interval 
increases  with  y.  This  is  so  because  at  low  values  of  y, 
the  system  is  down  for  longer  periods  of  time,  causing  a 
reduction  in  the  up  time  of  the  system. 

The  expected  total  cost  per  unit  time  (Ch(t)/t)  is 
now  calculated  from  Equation  (A3. 8)  for  given  cost  factors. 
Such  values  for  four  sets  of  cost  factors  are  given  in 
Table  A3. 2.  For  a  given  t,  the  cost  first  decreases  and 
then  increases  as  a  function  of  y.  In  other  words, 

C,  (t)/t  seems  to  be  a  convex  function  with  respect  to  y. 


TABLE  A3.1 


AVERAGE  AVAILABILITY  AMD  EXPECTED  NUMBER  OF  FAILURES 

HARDWARE  SYSTEM 
FAILURE  RATEJ  0*010 

AVERAGE  AVAILABILITY 


REPAIR 

TIM 

E 

RATE 

100.0 

250.0 

500,0 

1000.0 

2000.0 

0*01 

0.709261 

0.583529 

0.529082 

0.501385 

0.487535 

0*02 

0.762652 

0,693831 

0.668367 

0.655612 

0.649235 

0*05 

0.851105 

0,831405 

0.824793 

0.821488 

0.819835 

0.10 

0.910000 

0.904000 

0.902000 

0.901000 

0.900500 

0*20 

0.950139 

0.948476 

0.947922 

0.947645 

0.947507 

0*30 

0*965561 

0.964796 

0.964541 

0.964413 

0.964349 

0.50 

0.978733 

0.978450 

0.978355 

0.978308 

0.978284 

0,75 

0.985615 

0,985487 

0.985444 

0.985423 

0.985412 

1.00 

0.989132 

0.989059 

0.989035 

0.989023 

0.989017 

1.50 

0.992701 

0.992669 

0.992658 

0.992652 

0.992650 

2.00 

0.994506 

0.994487 

0.994481 

0.994478 

0.994477 

3.00 

0.996324 

0.996315 

0.996313 

0.996311 

0.996311 

4.00 

0.997238 

0,997233 

0.997231 

0.997231 

0.997230 

EXPECTED  NUMBER  OF  HARDWARE  FAILURES 


0,01 

0,7093 

1.4588 

2.6454 

5.0139 

9.7507 

0,02 

0.7627 

1.7346 

3.3418 

6.5561 

12.9847 

0.05 

0.8511 

2.0785 

4.1240 

8.2149 

16.3967 

0.10 

0.9100 

2.2600 

4.5100 

9.0100 

18.0100 

0,20 

0,9501 

2.3712 

4.7396 

9.4765 

18.9501 

0,30 

0.9656 

2.4120 

4.8227 

9.6441 

19.2870 

0,50 

0.9787 

2.4461 

4,8918 

9.7831 

19.5657 

0.75 

0.9856 

2.4637 

4.9272 

9.8542 

19.7082 

1,00 

0.9891 

2.4726 

4.9452 

9.8902 

19.7803 

1.50 

0.9927 

2.4817 

4.9633 

9.9265 

19.8530 

2.00 

0.9945 

2.4862 

4.9724 

9.9448 

19.8895 

3.00 

0.9963 

2.4908 

4.9816 

9.9631 

19.9262 

4.00 

0.9972 

2.4931 

4.9862 

9.9723 

19.9446 
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tadle  A3.2 


REPAIR 

RATE 

0*01 

0.02 

0.05 

0.10 

0.20 

0.30 

0.50 

0.75 

1.00 

1.50 

2.00 

3.00 

4.00 


0.01 

0.02 

0.05 

0.10 

0.20 

0.30 

0.50 

0.75 

1.00 

1.50 

2.00 

3.00 

4.00 


EXPECTED  TOTAL  COST  PEP:  UMIT  TIME 
HARDWARE  SYSTEM 
FAILURE  RATE  J  0.010 

CH1=10»  cH2«10>  cm3“10 


TIME 


100. 0 

250.0 

500.0 

1000.0 

2000.0 

3.0763 

4.3231 

4.8621 

5.1363 

5.2734 

2.6497 

3.3311 

3.5832 

3.7094 

3.7726 

2.0741 

2.2691 

2.3345 

2.3673 

2.3836 

1.9910 

2.0504 

2.0702 

2.0801 

2.0850 

2.5936 

2.6101 

2.6156 

2.6183 

2.6197 

3.4409 

3.4485 

3.4510 

3.4523 

3.4529 

5.3105 

5*3133 

5*3143 

5.3147 

5.3150 

7.7424 

7.7437 

7.7441 

7.7443 

7.7444 

10 . 2076 

10.2083 

10.2086 

10.2087 

10.2087 

15.1723 

15.1726 

15.1727 

15.1727 

15.1728 

20.1544 

20.1546 

20*1546 

20.1547 

20.1547 

30 . 1364 

30.1365 

30.1365 

30.1365 

30.1365 

40.1273  40.1274  40.1274 

CH1=100»  CH2=10»  and 

40.1274 

CH3»io 

40.1274 

3.7167 

4.8482 

5.3383 

5.5875 

5.7122 

3  *  3361 

3.9555 

4.1847 

4.2995 

4.3569 

2*8401 

3.0174 

3.0769 

3.1066 

3.1215 

2.8100 

2.8640 

2.8820 

2.8910 

2.8955 

3.4488 

3.4637 

3.4687 

3.4712 

3.4724 

4.3099 

4.3168 

4.3191 

4.3203 

4.3209 

6.1914 

6.1940 

6*1948 

6.1952 

6.1954 

8.6295 

8 . 6306 

8.6310 

8.6312 

8.6313 

11.0978 

11.0985 

11.0987 

11.0988 

11.0988 

16.0657 

16.0660 

16.0661 

16.0661 

16.0662 

21.0494 

21*0496 

21.0497 

21,0497 

21.0497 

31*0331 

31.0332 

31.0332 

31.0332 

31,0332 

41 .0249 

41.0249 

41,0249 

41.0249 

41.0249 
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TABLE  A3. 2 
(  CONTINUE!*) 


EXPECTED  TOTAL  COST  PER  UNIT  TIME 
HARDWARE  SYSTEM 
FAILURE  RATE  J  0«010 


CHI 

=  10 »  CH2* 

10 t  and 

CH3=i00 

REPAIR 

T 

X  M  E 

RATE 

100.0 

250.0 

500.0 

1000.0 

0.01 

29.2448 

41.8055 

47.2447 

50.0116 

0.02 

24.0111 

30.8863 

33*4301 

34.7043 

0.05 

15.4747 

17.4426 

18.1031 

18*4334 

0.10 

10.0910 

10.6904 

10.8902 

10.9901 

0.20 

7.0812 

7.2472 

7.3025 

7.3302 

0*30 

6.5404 

6.6169 

6.6424 

6.6551 

0.50 

7.2245 

7.2529 

7.2623 

7.2670 

0.75 

9.0371 

9.0499 

9.0541 

9.0563 

1.00 

11.1857 

11.1930 

11.1954 

11.1966 

1.50 

15.8292 

15.8324 

15.8335 

15.8340 

2.00 

20.6489 

20.6507 

20.6513 

20.6516 

3.00 

30.4673 

30.4681 

30.4684 

30.4685 

4.00 

40.3760 

40.3764 

40.3766 

40.3767 

CHI 

=10.  CH2= 

100 t  and 

CH3=io 

0.01 

3.9783 

5.2231 

5.7621 

6.0363 

0.02 

4.4497 

5.1311 

5.3832 

5.5094 

0.05 

6.5741 

6.7691 

6.8345 

6.8673 

0.10 

10.9910 

11.0504 

11.0702 

11.0801 

0.20 

20.5936 

20.6101 

20.6156 

20.6183 

0.30 

30.4409 

30.4485 

30.4510 

30.4523 

0.50 

50.3105 

50.3133 

50*3143 

50.3147 

0.75 

75.2424 

75.2437 

75.2441 

75.2443 

1.00 

100.2076 

100.2083 

100.2086 

100.2087 

1.50 

150,1723 

150.1726 

150.1727 

150.1727 

2.00 

200.1544 

200.1546 

200.1546 

200.1547 

3.00 

300.1364 

300.1365 

300.1365 

300.1365 

4.00 

400.1273 

400.1274 

400,1274 

400.1274 

i 


2000.0 

51.3953 

35.3415 

18.5985 

11.0401 

7.3441 

6*6615 

7.2694 

9.0573 

11.1972 

15.8343 

20.6518 

30.4686 

40.3767 


6*1734 

5*5726 

6.8836 

11.0851 

20.6197 

30.4529 

50.3150 

75.2444 

100.2087 

150.1728 

200.1547 

300.1365 

400.1274 
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Plots  of  A.  (t)  for  8  =  .01,  .05,  and  0.10  versus 

••ftV 

Y  are  shown  in  Figure  A3.1.  As  expected,  the  average 
availability  improves  with  y  as  well  as  with  an  improve¬ 
ment  in  the  failure  rate,  i.e.,  as  8  goes  from  0.10  to  0.01 
Costs  per  unit  times  for  various  t  are  shown  in 

Figure  A3. 2  for  8  ■  .01,  c  =  10,  ch  «  10,  ch  *  100 

12  3 

as  a  function  of  y  and  clearly  show  the  convexity  of  the 
cost  function.  A  similar  pattern  is  seen  in  Figure  A3. 3 
which  gives  the  plots  of  C^{t)/t  for  the  four  sets  of 
cost  factors  at  time  t  *  500  and  8  *s  .01. 


A3. 2  Operational  Cost  Model:  Software  System 


Consider  a  system  consisting  of  software  only.  At 
time  zero  it  is  operational  with  N  errors  in  the  system. 

A  failure  occurs  at  a  random  time,  T„,  whose  distribution 
is  given  by  Equation  A2 . 2  with  parameter  XN.  A  repair  is 
undertaken  and  with  probability  p  the  error  causing  the 
failure  is  removed  in  a  time  wN  whose  distribution  is  expo¬ 
nential  with  parameter  yN  (see  Equation  A2.4.)  The  next 
cycle  starts  with  (N-l)  errors  in  the  system  and  the 
failure  distribution  is  now  exponential  with  a  parameter 
(N-l)  X .  If  the  error  is  not  removed,  which  happens  with 

probability  q  =  1  -  p  ,  the  distribution  of  time  to  next 
s  s 

failure  is  again  exponential  with  parameter  XN.  A  similar 
behavior  is  observed  throughout  the  entire  life  cycle  of 
the  software  system  with  i(0  £  i  £  N)  remaining  errors. 

Note  that  the  model  is  similar  to  the  Imperfect  Maintenance 
Model  (IMM)  of  Okumoto  and  Goel  (1978) . 

A  diagrammatic  representation  of  the  behavior  of  the 
software  system  is  shown  in  Figure  A3. 4. 

As  discussed  in  Section  A3.1,  for  a  hardware  system, 
the  cost  elements  associated  with  the  failure-repair  cycles 
of  the  software  system  are 


c_  =  cost  of  a  software  failure, 
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c  *  cost  incurred  per  repair  per  unit  time# 
s2 

and  c  =  cost  of  system  down  time  per  unit  time. 
s3 


Then#  the  expected  total  operational  cost  by  time  t  is  given 


C  (t)  =  c  M  (t)  +  c  yt  +  c  |{1-A  (x)}  dx 

s  S  S  a  S  a  J  s 


(A3. 9) 


where 

Mg(t)  =  expected  number  of  software  failures  by  time  t, 
Ag(t)  =  availability  of  the  software  system  at  time  t. 

To  get  the  expression  for  M  (t)  and  A  (t) ,  we  first 

s  s 

give  the  Laplace-Stielt jes  transforms  of  the  appropriate 
quantities  as  follows. 

Let  G  . (t)  be  the  distribution  function  of  the  first 

N  9  1 

passage  time  from  state  N  to  state  i.  By  considering  the 
renewal  equation  associated  with  this,  the  Laplace-Stielt jes 


transform  of  G„  . (t)  is  obtained  as 
N,  1 

N  P_X.y. 

Gn  .(s)  =  n  - 5_1_2 - 

'  j=i+l  s  +  s(Xj+yj)  +  pX ji) j 


(A3. 10) 


Similarly,  the  L-S  transforms  of  M  (t)  and  A  (t)  are  given 

s  s 


«  N  X  .  (s  +  U-- ) 

M  (s)  =  l  -* - - - - -  G„  .  (s) 

i=l  s2  +  s  (Xi+wi)  +  PgX^y^  ' 


(A3. 11) 
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N 

Ag(s)  =  l 
5  i=0 


(1  - 


*i(s  +  PSV 


+  sU^)  +  PsXitJi 


)GN/i(s)  (A3. 12) 


where  GN  N(s)  =  1. 

Unlike  the  hardware  system  discussed  in  the  previous 

Section,  the  results  for  M  (t)  and  A  (t)  cannot  be  obtained 

s  s 

in  a  closed  form,  but  can  be  derived  from  Equations  (A3. 11) 
and  (A3. 12)  by  using  Lemmas  A2.1,  A2.2,  and  A2 . 3  as  follows. 

First  we  obtain  the  inverse  Laplace-Stielt jes  trans¬ 
form  for  Equation  (A3. 10).  We  write 


N  PsXiyi 

Gn  i(s*  =  n  (s  +  x - \4b  +•  x - r 

N'  j=i+l  (s  +  xl,j)(s  x2 ,  j 

Let  Xl, i+1  =  Xl'  x2 , i+1  =  x2 '  X1 , i+2  =  x3 '  x2,i+2 
...,  x2  N  “  XK  '  where  Ki  =  (N~i)  x  2f 


and  let 


N 


N 


as  given  in  Equation  (A2.33) 
By  Lemmas  A2.1-A2.3 


Ki  (  ®  psAkyk)  _xit 

1  ‘  SKK  .  (e  J  -  1) 


GN,i(t)  -  L,  - 

0  if-  ( — x  .  + 


-  x 


1=1 

in 


(-Xj  +  xa) 


N  Ki 


1) 


Kl 

rr  (-x.  +  x.) 
11=1  3  * 


in 
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Similarly,  we  get 


~  N  Xi(s  +  y£)  N  (PsXjyj) 

Vs)  =  (■  +  x1(i><«  +  x2ji)  j=J+1T5  +  Xi.jM®  +  *2,j> 


N  X.(s+y.)  U, 

=  Y  { - i - i - i - 

i=l  N 

IT  (s  +  x,  v)  (s  +  x~  v) 


Ms(t> 


N  Ki  Xi‘  "  psXk>1k)  -X.t 

l  (  I  [ -J2i±l -  (e  3  -  1) 

i=l  j  =  l  Ki 


l  <-xi  +  xji) 
£-1  3  * 

j^£ 


X.U.  (  n  PsXkyk) 
1  1  k=i+l  s  K  * 


-x.t 

(e  3  -  1) 


Jr  (— x .  +  x  > 
£■1  3 


Ms(t) 


N  N  K.  ,  ,  w  '"Xjt  . 

I  x.  u.  {  I1  (xi  1  yi)(e  ~  1} 

i=l  1  j-1  K. 

-x.  I  (-x.  +  x^) 

3  1=1  3  * 


(A3. 13) 


For  the  availability,  taking  the  inverse  L-S  transform 


of  Equation  (A3. 12),  we  have 


.-5: 


As(t) 


j0  {GN,i(t)  -  Vi-l(t) 

K.  -x.t 

'  ^iUi  l-£-  3  '  ^  ■> 

3-1  II1  (-x.  +  x.) 
£=1  D  * 
j^£ 


(A3.14) 


For  given  c  ,  c  ,  and  c  ,  the  expected  total  cost 
S1  s2  s  3 

can  be  obtained  from  Equation  (A3. 9)  by  substituting  for 

M  (t)  and  A  (t)  from  Equations  (A3. 13)  and  (A3. 14),  respec- 
s  s 

tively . 


Illustrative  Examples 

Now  we  numerically  study  the  behavior  of  M  (t) , 

A  (t)  and  C  (t)/t  as  a  function  of  the  software  repair 
sav  s 

rate  y,  failure  rate  X,  and  the  cost  factors  c  ,  c  , 

S1  s2 

and  c 

s3 

Let  us  consider  a  system  with  N  =  10,  y  =  0.05, 
and  p  =  0.9.  The  values  of  A  (t)  and  M  (t)  computed 

S  S  3  V  S 

from  the  formulae  derived  earlier  in  this  section  are  given 
in  Table  3.3  for  various  values  of  y  and  t.  From  the 
table  we  note  that  the  average  availability  improves  with 
t  as  well  as  with  y.  The  improvement  with  t  is  due  to  the 
fact  that,  as  more  software  errors  are  removed,  the  system 
fails  less  often.  The  improvement  with  repair  rate  is  due 
to  shorter  down  time. 

The  expected  number  of  failures  increases  with  t  and 
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with  y.  As  the  system  is  used  for  longer  time,  more  errors 

surface  resulting  in  more  failures.  Also,  as  y  improves, 

the  system  is  up  for  longer  periods  of  time  resulting  in 

more  software  failures.  Note  that  the  asymptotic  value  of 

M  (t)  is  simply  the  ratio  N/p  =  10/. 9  *  11.1111.  Plots 
s  s 

of  A  ( t )  for  A  =  .01,  .05,  and  .10  versus  y  are  shown  in 

S  clV 

Figure  3.5.  As  one  would  expect,  availability  improves  as 

y  goes  up  and  also  as  A  goes  from  0.10  to  0.05  to  0.01. 

The  expected  total  cost  per  unit  time  C  (t)/t,  is 

s 

given  in  Table  A3. 4  for  A  =  0.05,  N  =  10,  p  =0.9, 

S 

y  varying  from  0.01  to  4.00,  t  from  100  to  2000,  and  the 
cost  factors  varying  as  follows: 


\ 

c 

s 

10 

10 

10 

100 

10 

10 

10 

10 

100 

10 

100 

10 

Two  additional  sets  of  plots  of  the  cost  values 
versus  y,  taken  from  the  above  tables,  are  shown  in 
Figures  A3. 6  and  A3. 7. 
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TABLE  A3 . 3 


AVERAGE  AVAILABILITY  AND  EXPECTED  NUMBER  OF  FAILURES 

SOFTWARE  SYSTEM 
FAILURE  RATE*  0.050 

AVERAGE  AVAILABILITY 


REPAIR 

T  X  M 

E 

RATE 

100.0 

250.0 

500.0 

1000.0 

2000.0 

0.01 

0.196539 

0.215310 

0.392714 

0.674846 

0.837280 

0.02 

0.322907 

0.424598 

0.675202 

0.837280 

0.918640 

0.05 

0.548578 

0.740689 

0.869824 

0.934912 

0.967456 

0.10 

0.722236 

0.869873 

0.934912 

0.967456 

0.983728 

0.20 

0.849874 

0.934920 

0.967456 

0.983728 

0.991864 

0.30 

0.898108 

0.956612 

0.978304 

0.989152 

0.994576 

0.50 

0.938155 

0.973966 

0.986982 

0.993491 

0.996746 

0.75 

0.958565 

0.982644 

0.991322 

0.995661 

0.997830 

1.00 

0.968853 

0.986983 

0.993491 

0.996746 

0.998373 

1.50 

0.979190 

0.991322 

0.995661 

0.997830 

0.998915 

2.00 

0.984376 

0.993491 

0.996746 

0.998373 

0.999186 

3.00 

0.989574 

0.995661 

0.997830 

0.998915 

0.999458 

4.00 

0.992176 

0.996746 

0.998373 

0.999186 

0.999593 

EXPECTED  NUMBER  OF  SOFTWARE  FAILURES 


0.01 

6.6245 

10.1158 

11.0700 

11.1109 

11.1111 

0.02 

8.6796 

11.0015 

11.1106 

11.1111  " 

11.1111 

0.05 

10.3617 

11.1095 

11.1111 

11.1111  * 

11.1111 

0.10 

10.8006 

11.1109 

11.1111 

11.1111 

11.1111 

0.20 

10.9340 

11.1110 

11.1111 

11.1111 

11.1111 

0.30 

10.9650 

11.1110 

11.1111 

11.1111 

11.1111 

0,50 

10.9858 

11.1110 

11.1111 

11.1111 

11.1111 

0.75 

10.9949 

11.1110 

11.1111 

11.1111 

11.1111 

1,00 

10.9992 

11.1110 

11.1111 

11.1111 

11.1111 

1,50 

11.0034 

11.1110 

11.1111 

11.1111 

11.1111 

2.00 

11.0054 

11.1111 

11.1111 

11.1111 

11.1111 

3.00 

11.0073 

11.1111 

11.1111 

11.1111 

11.1111 

4.00 

11.0083 

11.1111 

11.1111 

11.1111 

11.1111 

■  1  '.»■  .* 

- W 

*  .• 

5 

~  T 

yr  i 

j 

TADI 

e  A3. 4 

*  -  -  -  / 

EXPECTED  TOTAL 

COST  PER 

OMIT  TIME 

SOPTWftRE  SYSTEM 

9 

FAILURE 

rate:  0.050 

c«i 

■10»  CS2* 

10  r  AMD 

063=10 

REP-AIR 

T 

I  M  E 

1 

100.0 

250.0 

500.0 

1000.0 

2000.0 

0,01 

8.7971 

8.3515 

6.3943 

3.4626 

1.7828 

0.02 

7.8389 

6.3941 

3.6702 

1.9383 

1.0692 

0.05 

6.0504 

3.5375 

2.0240 

1.2620 

0.8810 

0.10 

4.8577 

2.7457 

1.8731 

1.4366 

1.2183 

S*  0.20 

4.5947 

3.0952 

2.5477 

2.2738 

2.1369 

0.30 

5.1154 

3.8783 

3.4392 

3.2196 

3.1098 

0.50 

6.7170 

5.7048 

5.3524 

5.1762 

5.0881 

0.75 

9.0138 

8.1180 

7.8090 

7.6545 

7.5773 

1.00 

11.4114 

10.5746 

10.2873 

10.1437 

10.0718 

1.50 

16,3084 

15.5312 

15.2656 

15.1328 

15.0664 

£  2.00 

21.2568 

20.5095 

20.2548 

20.1274 

20.0637 

,  • 

. . -9m* 

m  3.oo 

31.2050 

30.4878 

30.2439 

30.1220 

30.0610 

1?  #  •  .*■ 

4.00 

*- 1*  * 

41.1791 

40.4770 

40.2385 

40.1192 

40.0596 

*.•*•*’■  i* 

! 

c*i 

=100,  c*2 

■10r  and 

cs3=10 

’•  •  \  >* ' 

* 

14.7591 

11.9932 

8.3869 

4.4626 

2.2828 

0.02 

15.6505 

10.3546 

5.6701 

2.9383 

1.5692 

0.05 

15.3759 

7.5369 

4.0240 

2.2620 

1.3810 

0.10 

14.5783 

6.7456 

3.8731 

2.4366 

1.7183 

0.20 

14.4352 

7.0952 

4.5477 

3.2738 

2.6369 

•  0.30 

14.9839 

7.8783 

5.4392 

4.2196 

3.6098 

#< 

^  0.50 

16.6042 

9.7048 

7.3524 

6.1762 

5.5881 

0.75 

18.9093 

12.1180 

9.8090 

8.6545 

8.0773 

-  * 

1.00 

21.3107 

14.5746 

12.2873 

11.1437 

10.5718 

1,50 

26.2115 

19.5312 

17.2656 

16.1328 

15.5664 

2.00 

31.1616 

24.5095 

22.2548 

21.1274 

20.5637 

V  3.00 

41.1116 

34.4878 

32.2439 

31.1220 

30.5610 

^ 

4,00 

t 

51.0865 

44.4770 

42.2385 

41.1192 

40.5596 

■  •  i 

■9 

•  •  'ml 

r  ■ 

-V  -r> 

\-  -  < 

\  .  \ 
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TABLE  A3. 3 
<  CONTINUED ) 


EXPECTED  TOTAL  COST  PER  UNIT  TIME 
SOFTWARE  SYSTEM 
FAILURE  RATE*  0.050 


csi=ior  cs2=10f  ANr'  cs3=100 


REPAIR 

T 

I  M  E 

RATE 

100.0 

250.0 

500.0 

1000.0 

0.01 

81.1085 

78.9736 

61.0500 

32.7265 

0.02 

68.7773 

58.1803 

32.9020 

16.5832 

0.05 

46.6783 

26.8755 

13.7398 

7.1199 

0.10 

29.8565 

14.4571 

7.7310 

4.3655 

0.20 

18.1060 

8.9525 

5.4766 

3.7383 

0.30 

14.2857 

7.7833 

5.3918 

4.1959 

0.50 

12.2831 

8.0478 

6.5240 

5.7620 

0.75 

12.7429 

9.6800 

8.5901 

8.0450 

1.00 

14.2146 

11.7461 

10.8731 

10.4366 

1.50 

18.1813 

16.3122 

15*6561 

15.3281 

2.00 

22.6629 

21.0953 

20.5477 

20.2738 

3.00 

32.1434 

30.8783 

30.4392 

30.2196 

4.00 

41.8832 

40.7699 

40.3849 

40.1925 

csi 

=  10.  CS2= 

100? 

cs3=10 

0.01 

9.6971 

9.2515 

7.2943 

4.3626 

0.02 

9.6389 

8.1941 

5.4702 

3.7383 

0.05 

10.5504 

8.0375 

6*5240 

5.7620 

0.10 

13.8577 

11.7457 

10.8731 

10.4366 

0.20 

22.5947 

21.0952 

20.5477 

20.2738 

0.30 

32.1154 

30.8783 

30.4392 

30.2196 

0.50 

51.7170 

50.7048 

50.3524 

50.1762 

0.75 

76.5138 

75.6180 

75.3090 

75.1545 

1.00 

101.4114 

100.5746 

100.2873 

100.1437 

1.50 

151.3084 

150.5312 

150.2656 

150.1328 

2.00 

201.2568 

200.5095 

200.2548 

200.1274 

3.00 

301.2050 

300.4878 

300.2439 

300.1220 

4.00 

401.1791 

400.4770 

400.2385 

400,1192 

2000.0 

16.4276 

8.3916 

3*8100 

2.6828 

2.8692 

3.5980 

5.3810 

7.7725 

10.2183 

15.1640 

20.1369 

30.1098 

40.0962 


2.6828 

2.8692 

5.3810 

10.2183 

20*1369 

30.1098 

50.0881 

75.0773 

100.0718 

150.0664 

200.0637 

300.0610 

400.0596 
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Figure  A3. 6  shows  how  the  average  cost  changes  with  y  for 
the  four  sets  of  cost  factors.  The  minimum  for  these  plots 
occurs  at  different  values  of  y  due  to  changes  in  the  cost 
factors.  in  Figure  A3 . 1 ,  the  average  costs  are  shown  for 
various  time  horizons.  We  note  that  all  the  curves  follow 
a  similar  pattern,  i.e.,  first  decreasing  with  y and  then 
increasing. 


AVAILABILITY 


o 

o 


SOFTWARE  SYSTEM 


Figure  A3. 5.  Average  Availability  vs.  Repair  Rate  for  Different 
Failure  Rates  (t  =  500) . 
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Figure  A3. 7.  Expected  Total  Cost/Unit  Time  vs.  Repair  Rate  for 
Different  Times  (X  *  .05,  c  =  10 ,  c  *  10,  c  =  1 


A3. 3  Operational  Cost  Model:  Hardware-Software  System 

We  consider  a  hardware-software  system  whose  behavior 

is  the  same  as  the  system  discussed  in  Section  A2.1.  Having 

discussed  the  cost  models  for  hardware  only  and  software 

only  systems  in  the  previous  Sections,  the  operational 

cost  of  the  hardware-software  system  is  basically  the  sum 

of  both  the  operational  costs.  However,  the  performance 

measures  required  in  the  hardware-software  system  are  not 

obviously  equal  to  their  respective  sums. 

The  cost  elements  associated  with  the  operation  of 

this  system  are  ch  »  ch  »  and  ch  as  defined  in  Section 

12  3 

A3. 1, and  c  ,  c  ,  and  c  ,  as  defined  in  Section  A3. 2. 

S1  s2  s3 

The  performance  measures  required  for  the  cost  model 
ares  M^t)  and  Ms(t),  the  expected  number  of  hardware 
and  software  failures  by  time  t,  respectively,  and  the 
expected  total  down  time  during  (0,t). 

Let  C(t)  be  the  expected  total  ooerational  cost  associ¬ 
ated  with  the  hardware-software  system  and  let  c^  =  =  c3  . 

Then 


C(t) 


Vh(t> 


+  c 


SlMs(t) 


+  ch  Yt 
n2 


+  cs„yt 


t 

+  c3 |  (1  -  A (x) ) dx, 
0 


(A3. 15) 
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Illustrative  Examples 

Now  we  numerically  study  the  behavior  of  M^(t),  Mg(t) 
C(t)/t  and  A  (t)  as  a  function  of  y  and  y.  The  values 

SV 

of  A(x)  ,  A  (t)  ,  M  (t)  and  M.  (t)  are  computed  from  Equa- 

clV  S  A 

tions  (A2.43),  (A2.44),  (A2.45),  and  (A2.49),  respectively. 
The  values  of  C(t)  are  given  by  Equation  (A3 -15). 

Let  us  consider  a  system  with  N  =  10,  p  =  .9,  p.  =  .9, 
B  =  .01,  and  X  =  .05.  For  t  =  100,  y  =  .02  to  1.0 

and  y  =  0.01  to  0.50,  the  values  of  A_  (t) ,  M  (t)  and 

cL V  S 

M^(t)  are  given  in  Table  A3. 5.  We  note  that  the  average 
availability  improves  with  both  the  hardware  and  the  soft¬ 
ware  repair  rates.  Also,  the  expected  number  of  failures 
increases  with  increase  in  y  and  y.  This  is  because  of 
the  increased  amount  of  time  that  the  system  is  up  leading 
to  a  longer  time  available  for  the  failures  to  occur. 

Note  that  for  these  data  sets  all  software  errors  have  been 
removed  by  approximately  t  =  500.  As  pointed  out  earlier, 
after  this  happens,  the  system  behaves  as  a  hardware  only 
system.  To  see  how  C(t)/t  behaves  as  a  function  of  y  and 
y,  let  us  suppose  that  c  =  10,  c,  =  10,  c  =  10,  c.  = 

=i  n^  s^  n2 

10,  c  =  c,  =10,  and  t  =  100  to  2000  as  shown  in  Table 
S3  n3 

A3. 9.  As  can  be  easily  seen,  the  cost  varies  with  both  y 
and  y.  As  an  example,  for  t  =  500,  y  =  0.10,  C(t)/t 
goes  from  5.97  to  12.23  as  y  goes  from  0.02  to  1.00.  The 


minimum  seems  to  occur  around  y  =  0.10.  Similarly,  for 
Y  =  0.10,  t  =  500,  Cg(t)  goes  from  9.07  to  7.63  as  \i  goes 
from  0.01  to  0.50  with  the  minimum  occurring  at  around 
li  —  0.10.  A  similar  behavior  is  seen  for  other  t  values. 


AVERAGE  AVAILABILITY  AND  EXPECTED  NUMBER  OF  FAILURES 


0.010  0.1840  0.1892  0.1923  0.1943  0.1956  0.1961 
0.050  0.4650  0.4965  0.5170  0.5310  0.5410  0.5447 
0.100  0.5856  0.6356  0.6690  0.6923  0.7093  0.7156 
0.250  0.6868  0.7569  0.8041  0.8369  0.8606  0.8694 
0.500  0.7242  0.3030  0.8559  0.8923  0.9185  0.9281 
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AVERAGE  AVAILABILITY  AMD  EXPECTED  HUMBER  OP  FAILURES 


0.010  10.8926  13.7340  15.0810  15.8670  16.3820  16.5617 
0.050  12.5663  15.8642  17.4242  18.3335  18.9290  19.1366 
0.100  12.7755  16.1304  17.7171  18.6418  19.2473  19.4585 
0.250  12.9010  16.2902  17.8928  18.8268  19.4383  19.6516 
0.500  12.9429  16.3434  17.9514  18.8885  19.5020  19.7160 
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EXPECTED  TOTAL.  COST  PER  UNIT  TIME 
HARDWARE— SOPT WARE  SYSTEM 


csl=10f  CH1=10,  CS2=10»  CH2=10»  CS3=CH3*10 

t  *  100 


SOFTWARE 

REPAIR 


HARDWARE  REPAIR  RATE 


RATE 
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Figure  A3. 9.  Surface  of  Average  Availability  vs.  Repair  Rates:  Hardware 
Software  System  (6  =  .01,  A  =  .05,  t  -  500). 
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